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ABSTRACT 


The  Speech  Systems  Technology  Group  at  Lincoln  Laboratory  has  been  de^ 
veloping  an  automatic  speech-to-speech  translation  system  for  English  to  Korean 
translation.  For  the  machine  translation  module,  an  interlingua  system  has  been 
adopted.  This  system  analyzes  the  source  language  text  and  represents  the  results 
of  the  analysis  in  a  semantic  frame,  which  is  a  representation  of  the  meaning  of  the 
input  text  from  which  the  text  in  the  target  language  is  generated.  GENESIS,  a 
language  generation  system  developed  at  the  Spoken  Language  Systems  Group  at 
the  Laboratory  for  Computer  Science  at  MIT,  has  been  utilized  for  the  language 
generation  component.  GENESIS  had  previously  been  used  in  limited  domains  for 
European  languages  and  for  Japanese.  The  work  reported  here  shows  that  GENE¬ 
SIS  is  capable  of  generating  a  wide  range  of  Korean  sentences.  It  explores  the  degree 
to  which  GENESIS  can  handle  Korean  as  a  target  language  in  a  machine  translation 
system  and  describes  extensions  to  GENESIS  that  are  needed  to  produce  Korean 
output. 
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1.  INTRODUCTION 


1.1  Machine  Translation  Systems 

The  Speech  Systems  Technology  Group  at  Lincoln  Laboratory  has  been  developing  an  auto¬ 
matic  speech-to-speech  translation  system  (SSTS)  for  English  and  Korean.  It  has  been  proposed 
that  the  English- Korean  automatic  SSTS  be  used  by  military  coalition  forces  in  Korea  where  Korean 
and  American  soldiers  must  communicate.  The  purpose  of  building  the  English-Korean  automatic 
SSTS  is  to  help  the  soldiers  communicate  in  their  own  respective  languages.  Reaching  this  goal  is 
difficult  because  of  the  many  differences  between  the  two  languages. 

A  typical  SSTS  works  in  three  phases:  speech  recognition,  language  translation,  and  speech 
synthesis  [1].  The  first  phsise  recognizes  the  speech  in  the  source  language  (SL)  and  then  produces 
the  utterance  in  text  form.  The  second  phase  analyzes  the  utterance  and  translates  it  into  the 
target  language  (TL)  and  then  into  text  form.  The  last  phase  converts  the  translation  into  sound. 
See  Figure  1. 
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Figure  1.  A  typical  SSTS.  ' 
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Most  machine  translation  systems  developed  to  date  faU  into  two  categories,  depending  on 
how  the  language  translation  is  approached:  transfer  and  interlingua  [l].  Transfer  systems  involve 
finding  the  target  language  correlates  for  lexical  units  and  syntactic  constructions  of  the  source 
language,  whereas,  in  interlingua  systems  the  SL  and  TL  are  never  in  direct  contact.  Interlingua 
systems  analyze  the  source  language  text  and  represent  the  results  of  analysis  in  interlingua  text 
(ILT),  an  unambiguous  textual-meaning  propositional  representation  language  [2],  from  which  the 
text  in  the  TL  is  generated. 

The  ILT  approach  has  been  chosen  at  MIT  Lincoln  Laboratory.  The  system  consists  of 
analysis  and  generation  programs  [1].  The  source  language  text  is  processed  by  a  text  analysis 
program.  This  program  uses  knowledge  of  the  SL  grammar  and  lexicon  to  produce  ILT.  The  ILT 
is  passed  to  the  generation  program,  which  then  produces  the  output  translation  in  the  target 
language  using  TL  lexicon  and  grammar.  See  Figure  2. 

A  robust  ILT  system  assumes  access  to  complete  knowledge  sources  for  each  of  the  languages 
the  system  handles  for  processing,  and  it  also  assumes  that  IL  can  adequately  represent  the  semantic 
meaning  of  the  SL.  This  assumption  is  crucial  when  the  approach  is  applied  to  some  Asian  languages 
such  as  Korean  or  Japanese  [3].  Some  Asian  languages  have  various  styles  of  speech  indicating  the 
relative  positions,  sexes,  and  ages  of  the  speaker  and  listener.  The  differences  in  the  styles  can 
be  very  complex.  Therefore,  when  some  Asian  languages  are  used  as  TLs,  even  a  simple  English 
word  like  “hi”  may  be  mapped  onto  a  very  complex  form  in  order  to  capture  the  meaning.  For  a 
translation  within  a  limited  domain,  it  may  be  possible  to  simplify  the  analysis. 
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Figure  2.  Translation  using  the  ILT  approach  [1]. 
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1,2  CCLINC 


Common  Coalition  language  at  LINColn  Laboratory  (CCLINC)  is  a  system  architecture  and 
concept  demonstration  for  automatic  speech-to-speech  translation  for  limited-domain  multilingucd 
applications  [4],  The  proposed  application  of  this  system  is  in  the  coalition  battle  management 
environment.  The  system  translates  speech  in  one  of  three  languages  (English,  French,  or  Korean) 
into  one  of  the  other  two  languages  with  both  languages  utilizing  a  Common  Coalition  Language 
(CCL)  as  a  military  interlingua  [4]. 

Figure  3  depicts  the  planned  structure  of  CCLINC.  The  subsystem  architecture  is  composed 
of  a  module  consisting  of  speech  recognition,  natural  language  understanding,  language  generation, 
and  speech  synthesis  for  each  language.  Each  of  these  modules  produces  a  meaning  representation 
in  the  form  of  a  semantic  frame.  These  semantic  frames  are  transmitted  via  a  Common  Coalition 
Language  network  to  be  used  as  input  to  the  language  generator  in  a  diiferent  language  [4]. 

The  vocabulary,  grammar,  and  semantics  of  CCLINC  are  specifically  designed  to  suit  brigade 
communications.  A  transcription  of  a  Task  Force  Command  Net  exercise  is  used  as  the  main  source 
in  providing  a  specification  of  command  and  control  message  formats.  It  contains  1400  utterances 
[4].  The  following  are  some  example  sentences: 


(1-a)  Call  some  artillery 

(1-b)  Request  penaission  to  defend  hilltop  echo 
(l-“c)  Enemy  sighted  at  hilltop  Charlie 
(l“d)  This  is  delta 

(1-e)  Let  me  get  a  grid  from  alpha  and  I  will  pass  it  to  you 


1.3  Korean  Language  in  CCLINC 

An  ideal  semantic  frame  perfectly  extracts  and  represents  all  the  fine  details  of  speech  in 
a  source  language.  Even  with  this  ideal  frame  some  difficulties  arise  in  dealing  with  the  Korean 
language.  The  most  prominent  example  stems  from  the  fact  that  there  are  various  styles  of  speech 
indicating  the  relative  positions,  sexes,  and  ages  of  the  speaker  and  listener  in  Korean.  These 
differences  in  styles  can  be  very  complex.  Therefore,  when  Korean  is  used  as  the  target  language, 
even  a  simple  English  word  like  “hi”  becomes  hard  to  translate.  The  knowledge  sources  used  along 
with  the  analysis  phase  may  have  to  be  very  complex  in  order  to  capture  the  meaning  sufficiently 
for  translation  into  Korean,  not  to  mention  the  need  to  capture  the  relative  positions,  sexes,  and 
ages  of  the  speaker  and  listener  from  the  context  of  the  source  language.  To  illustrate  how  the 
ranking  of  the  speaker  and  listener  can  affect  this  simple  phrase,  consider  the  following  variations 
of  the  Korean  translation  for  the  same  English  phrase,  “trying  to  obtain.” 
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(3-a)  GuHaRyeoNeun  JunglYo  -  when  speaking  to  a  peer. 

(3-b)  GuHaRyeoNeun  JimglDa  -  when  speaking  to  a  peer  or  someone  of 

lower  rank . 

(3-c)  GuHaRyeoNeun  JunglYa  -  same  as  above. 

(3-d)  GuHaRyeoNeun  JunglJyo  -  when  speaking  to  a  slightly  older 

person. 

(3-e)  GuHaRyeoNeun  JunglEoYo  -  when  speaking  to  a  superior. 

(3-f)  GuHaRyeoNeun  JungIbNiDa  -  when  speaking  to  a  superior. 

For  the  proposed  task,  however,  the  difficulty  may  be  reduced  to  a  great  extent  because  the 
domain  of  usage  is  very  limited.  Within  this  hmited  domain,  it  is  plausible  to  assume  that  the 
system  will  emulate  the  speech  used  by  an  educated  military  male  of  middle  rank  when  talking  to 
his  peers. 

1.4  TINA  and  GENESIS 

In  order  to  understand  the  language,  the  Speech  Group  at  MIT  Lincoln  Laboratory  has 
decided  to  use  TINA,  a  system  developed  at  the  Spoken  Language  Systems  Group  (SLSG)  at 
Massachusetts  Institute  of  Technology  (MIT),  Laboratory  for  Computer  Science  (LCS)  [5].  TINA 
utilizes  key  ideas  from  context  free  grammars,  augmented  transition  networks,  and  the  unification 
concept  [5]. 

For  language  generation,  GENESIS  has  been  adopted,  which  was  also  developed  at  the  SLSG 
[4].  GENESIS  is  driven  by  three  tables:  vocabulary,  messages,  and  rewrite  rules  [4].  The  tables 
are  the  parameters  for  the  system,  a  system  that  can  be  manipulated  to  produce  output  sentences 
with  a  given  ILT.  By  changing  these  tables,  a  different  set  of  styles  of  Korean  sentences  can  be 
generated  from  the  same  English  sentence. 

GENESIS  has  been  used  for  European  languages  for  general  purposes  and  for  Japanese  in 
limited  domains  [4].  It  is  also  capable  of  handhng  some  of  the  linguistic  phenomena  that  are 
needed  for  Korean.  This  report  explores  the  degree  to  which  GENESIS  is  able  to  handle  the 
Korean  language  generation,  and  proposes  modifications  to  further  generalize  GENESIS. 


4 


263982-3 


FRENCH/CCL 
TRANSLATION  SYSTEM 


Figure  3.  System  structure  for  multilingual  SSTS  [6]. 
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In  order  to  start  measuring  the  expandability  of  GENESIS  to  Korean  generation,  a  couple  of 
assumptions  have  been  made.  First,  it  is  assumed  that  the  analysis  phase  of  the  translation  system 
has  been  executed  correctly,  and  that  IL  represents  the  meaning  of  the  SL  adequately.  Second, 
Korean  generation  in  a  military  context  gives  the  upper  bound  for  the  performance  of  GENESIS 
as  it  represents  a  subset  of  all  the  Korean  language.  The  translation  system  models  the  language 
that  educated,  middle-ranked,  military  personnel  of  would  use  in  battlefields. 

1.5  Evaluation  Procedure 

A  transcription  of  a  Task  Force  Command  Net  exercise  was  used  to  evaluate  the  performance 
of  the  system.  Initially  a  grammar  was  developed  to  understand  a  set  of  530  sentences,  497  of 
which  were  taken  from  the  exercise  transcription.  In  the  first  evaluation,  the  one  reported  here, 
native  Korean  speakers  evaluated  CCLINC’s  text  translations  of  the  530  sentences.  The  parsed 
sentences  are  evaluated  on  the  basis  of  the  adequacy  (i.6.,  the  closeness  of  the  meamng  between  the 
source  and  the  target  language)  and  the  fluency  of  the  translation.  This  evaluation  was  carried  out 
by  four  native  Korean  speakers.  They  scored  each  translation  from  5  to  1,  5  being  the  best  and  1 
being  the  worst.  The  evaluation  procedure  is  discussed  in  greater  detail  in  Sections  5.1.1  and  5.1.2. 


2.  KOREAN  LANGUAGE  PHENOMENA 


2.1  Word  Order  Rules 

The  basic  word  order  in  Korean  is  characterized  by  Subject-Object- Verb  (SOV),  clearly  differ¬ 
ent  from  Subject- Verb-Object  (SVO)  of  English.  Some  examples  of  distinguishable  characteristics 
of  Korean  word  order  are  [3]: 

1.  The  verb  comes  at  the  end  of  a  clause. 

2.  Negation  is  represented  by  changes  at  the  ending  of  the  verb. 

3.  Noun  phrases  are  followed  by  postpositions,  unlike  English  where  noun  phrases  are 
preceded  by  prepositions. 

4.  Modifiers  precede  the  word  that  is  modified. 

5.  Words  that  need  to  be  emphasized  are  usually  placed  close  to  the  verb. 

6.  When  word  A  modifies  word  B  and  word  C  modifies  word  D,  the  two  pairs  must  not 
cross  each  other.  The  word  order  A  C  B  D  violates  this  rule,  because  A-B  crosses 
C-D.  However,  A  C  D  B  satisfies  this  rule. 

Rule  1,  along  with  Rule  3,  are  the  basic  characteristics  of  Korean  that  work  with  the  properties 
of  postpositions  to  allow  a  wide  variety  of  sentences  having  essentially  the  same  lexical  meaning,  but 
provoking  subtly  different  contextual  meanings.  Having  the  verb  come  at  the  end  allows  space  in 
which  all  the  preceding  words  can  be  scrambled  with  each  other  in  front.  This  scrambling,  however, 
does  not  give  rise  to  any  confusion,  as  postpositions  clearly  identify  which  word  is  fulfilling  which 
role  in  a  particular  sentence.  In  light  of  this  fact,  it  will  be  necessary  to  explain  what  the  Korean 
postpositions  do. 

The  postpositions  functions  are  similar  to  English  prepositions;  they  describe  the  relationships 
that  immediately  preceding  nouns/noun  phrases  have  with  other  words  in  the  clause/sentence.  A 
description  of  this  function  can  be  found  in  any  literature  discussing  Korean  grammar.  The  following 
is  a  translation  of  essential  points  made  by  Cho  [6]. 

Cho  defines  postpositions  as  “words  that  do  not  have  independent  meanings  of  their  own,  but 
when  attached  to  other  words,  give  them  grammatical  relationships  with  the  rest  of  the  words  or 
additional  meanings.”  Some  of  the  prominent  properties  of  postpositions -are  as  follows  in  Cho  [6]: 

(1)  Because  only  postpositions  do  not  have  independent  meanings  of  their  own  in  Korean,  they 
can  be  distinguished  from  all  other  classes  of  words. 

(2)  They  are  usually  put  at  the  end  of  nouns,  adverbs  or  other  postpositions. 
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(2-a)  JaJeonGeo“Reul”  SassDa  -  following  a  noun 
BICYCLE  BOUGHT 

(I)  bought  a  bicycle 

(2-b)  NalSsiGa  MobSi“Do”  NaBbeuJi?  -  following  an  adverb 

WEATHER  VERY  BAD  ? 

The  weather  is  very  bad,  isn’t  it? 

(2-c)  DangSin“GgaJi”  “Ga”  HabGyeogIRaNe  -  following  a  postposition 
YOU  UP  TO  ACCEPTED 

Those  who  are  accepted  are  up  to  and  including  you 

When  the  attachment  happens,  the  preceding  words  do  not  alter  their  endings,  unless  they 
are  pronouns.  Even  pronouns  do  not  always  change  their  endings. 

(2-d)  Na  “Ga”  — >  NaiGa  -  vowel  “a”  changed  to  “ai” 

II 

(2-e)  Na  ‘‘Neun’’  — >  NaNeun  -  no  change 
I  I 


(3)  Some  sets  of  postpositions  have  identical  meanings  but  are  used  differently  depending  on  the 
ending  of  the  preceding  syllable,  i.e.,  whether  the  ending  is  a  vowel  or  a  consonant.  The  following 
examples  illustrate  this  property.  “Reul”  and  “Eul”  have  identical  meanings,  that  is,  they  indicate 
that  the  preceding  word  is  a  direct  object.  However,  “Reul”  is  used  when  the  ending  of  the  preceding 
word  is  a  vowel,  whereas  “Eul”  is  used  when  the  ending  is  a  consonant. 

(2-f)  Neo‘‘Reul’’  -  vowel  ending  “eo” 

YOU 

(2-g)  Chaig‘‘Eul’’  -  consonant  ending  “g” 

BOOK 


(4)  Some  postpositions,  such  as  those  that  mean  “of”  and  “be,”  can  be  omitted  without  altering 
the  meaning  of  the  phrase.  This  omission  may  also  occur  when  omitting  does  not  confuse  any 
grammatical  relationships  among  the  words  in  a  sentence. 
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(2-h)  URi‘‘Eui’’  NaRa  — >  URiNaRa 
OUR  NATION  OUR  NATION 

(2-1)  IGeosEun  YeoJa“I”  “Go”  JeoGeosEun  NainJa“I”  “Da” 
— >  IGeosEun  YeoJa  ‘I”  “Go”  JeoGeosEun  NaiiiJa“Da” 
THIS  FEMALE  BE  AND  THAT  MALE  BE 

This  is  a  female,  and  that  is  a  male 

(2-J)  NeoNeun  SugJe“Reul”  Hai  — >  NeoNeun  SugJe  Hai 

YOU  HOMEWORK  DO  YOU  HOMEWORK  DO 

You  do  the  homework 


(5)  Postpositions  can  be  classified  into  three  categories:  conjunctive,  complementary,  and  role¬ 
assigning.  Conjunctive  postpositions  are  similar  to  the  English  word  “and”  in  the  sense  that  they 
connect  the  preceding  and  the  following  words,  which  share  a  common  property,  into  a  group. 
Complementary  postpositions  add  special  meanings  to  the  preceding  words,  such  as  comparing, 
lower/upper  bounding,  all-including,  beginning,  ending,  selecting,  and  limiting.  Role- assigning 
postpositions  assign  roles  (subject,  object,  etc)  to  nouns/noun  phrases.  This  group  of  postpositions 
wiU.  be  explained  further  as  English  has  no  such  equivalents. 

Role-assigning  postpositions  have  the  following  properties. 


1.  Role-assigning  postpositions  follow  nouns,  noun  phrases,  and  gerunds. 

2.  Roles  that  can  be  assigned  and  the  postpositions  that  assign  those  roles  are  as  follows: 


subject  -  I/Ga,  Nuen/Eim,  GgeSeo,  ESeo,  Seo 

direct  object  -  Eul/Reul 

indirect  object  -  E/EGe,  HanTe 

possessive  -  Ui 

adverb  -  URo 


calling  -  A/Ya,  lYeo 

verb  -  IDa  “be” 


Rule  2  wiU  be  explored  further  when  discussing  how  Korean  verbs  behave. 
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Rule  4  reveals  the  most  distinct  characteristics  of  ordering  in  Korean.  In  English  for  example, 
modifiers  can  follow  the  clauses  that  they  modify;  Korean  modifiers  always  precede  the  clauses  they 
modify.  Consider  this  example. 

(2-k)  WAITRESS  SERVING  POTATO  CHIPS 

would  be  translated  as 

(2-1)  POTATO  CHIPS  SERVING  WAITRESS. 


Rule  5  needs  special  attention,  as  it  allows  a  wide  variety  of  word  ordering.  Consider  the 
following  examples. 


(2-m)  Na  Neun  ONeul  2Si  E  HagSaingHoiGoan  ESeo  ChinGu  Oa 

I  TODAY  2  O’CLOCK  AT  STUDENT  CENTER  AT  FRIEND  WITH 

JeomSim  Eul  MeogEossDa. 

LUNCH  ATE. 


(2-n)  Nai  Ga  2Si  E  HagSaingHoiGoan  ESeo  ChinGu  Oa  JeomSim 

I  2  O’CLOCK  AT  STUDENT  CENTER  AT  FRIEND  WITH  LUNCH 

Eul  MeogEunGeos  Eun  ONeul  lEossDa. 

EATING  TODAY  WAS. 


(2-o)  Nai  Ga  ONeul  HagSaingHoiGoan  ESeo  ChinGu  Oa  JeomSim  Eul 
I  TODAY  STUDENT  CENTER  AT  FRIEND  WITH  LUNCH 

MeogEunGeos  Eun  2Si  EossDa. 

EATING  2  O’CLOCK  WAS. 
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(2-p)  Nai  Ga  ONeul  2Si  E  ChinGu  Oa  JeomSim  Eul  MeogEunGeos 

I  TODAY  2  O’CLOCK  AT  FRIEND  WITH  LUNCH  EATING 

Eun  HagSaingHoiGoan  ESeo  YeossDa. 

STUDENT  CENTER  AT  WAS. 

(2-q)  Nai  Ga  ONeul  2Si  E  HagSaingHoiGoan  ESeo  JeomSim  Eul 

I  TODAY  2  O’CLOCK  AT  STUDENT  CENTER  AT  LUNCH 

MeogEunGeos  Eun  ChinGu  Oa  YeossDa. 

EATING  FRIEND  WITH  WAS. 

(2-r)  Nai  Ga  ONeul  2Si  E  HagSaingHoiGoan  ESeo  ChinGu  Oa 

I  TODAY  2  O’CLOCK  AT  STUDENT  CENTER  AT  FRIEND  WITH 

MeogEunGeos  Eun  JeomSim  lEossDa. 

EATING  LUNCH  WAS. 


(2-m)  can  be  translated  to  “I  had  lunch  with  a  friend  at  the  student  center  at  2  o’clock  to¬ 
day.”  The  subsequent  sentences  place  special  emphasis  on  the  words  “today,”  “2  o’clock,”  “student 
center,”  “friend,”  and  “lunch,”  respectively,  by  putting  them  close  to  the  verb.  Postpositions  in 
Korean  are  what  make  this  scrambling  possible,  while  stiU  preserving  the  essential  meaning  of  the 
original  sentence  and  the  role  each  word  satisfies. 

2.2  Conjunctive  Relations 

Conjunctive  relations  can  be  divided  into  two  parts:  temporal  and  logical  [6].  The  temporal 
relation  determines  how  events  should  be  ordered  by  using  words  like  “after,”  “before,”  “during,” 
“lead  to,”  “result,”  and  “then.”  The  logical  relation  determines  the  logical  connections  among  the 
events. 

2.3  Verb  Suffixes 

Perhaps  the  verbs  of  the  Korean  language  are  what  distinguish  Korean  from  all  other  lan¬ 
guages.  A  great  number  of  variations  of  the  suffixes  (with  slight  and  subtle  differences  in  meaning) 
mark  not  only  past,  present,  and  future  tenses  as  in  English,  but  also  indicate  other  traits  like  po¬ 
liteness,  and  degree  of  familiarity  of  the  speaker  with  respect  to  the  listener  [6].  The  honorific/polite 
suffixes  are  discussed  first  in  this  section. 
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2.3.1  Honorific/Polite  Suffixes 

(1)  When  the  subject  of  a  sentence  is  of  a  higher  rank  than  the  speaker,  in  order  to  show  respect 
to  the  subject,  honorific  suffixes  are  added  to  the  verb.  “Si”  is  one  of  the  most  widely  used  honorific 
suffixes. 

(2-s)  EoMeoNiGgeSeo  JinJiReul  D6u“Si”EossDa 
MOTHER  MEAL  ATE 

(My)  mother  ate  the  meal 

When  “Si”  is  combined  with  another  postposition  “Ob,”  it  becomes  an  even  stronger  honorific 
suffix. 

(2-t)  ImGeumNimGgeSeoNeun  SuRaReul  Deu‘  ‘Si”  “Ob’  ’SoSeo 
KING  MEAL  EAT 

Please  eat  the  meal,  (my)  king 

Note  that  “JinJi”  and  “SuRa”  mean  the  same,  but  are  used  differently.  “JinJi”  is  already  an 
honorific  noun  for  “Bab”  (meal)  in  Korean,  but  “SuRa”  is  so  honorific  that  it  is  only  used  when 
referring  to  the  meals  of  a  king.  This  honorific  style  matches  with  the  use  SiOb  in  the  example 
(2-t). 

Consider  the  following  example. 

(2-u)  DongSaingl  NajJamEul  JamDa 

YOUNGER  BROTHER  NAP  SLEEP 

(My)  yoimger  brother  is  taking  a  nap 

(2-v)  SeonSaingNimGgeSeoNeun  NajJamEul  JuMu“Si”nDa 
TEACHER  nap  SLEEP 

(My)  teacher  is  taking  a  nap 

“JanDa”  means  “to  sleep.”  And  adding  “Si”  to  it  alters  it  into  “JuMuSinDa,”  the  honorific 
form  of  “JanDa.” 
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(2)  When  an  honorific/polite  verb  also  indicates  tense,  honorific-tense-polite  is  the  order  that  the 
respective  endings  follow.  For  example,  consider  the  word  “eat.” 


Lexical 

-  MeogDa 

Honorific 

-  JabSu“Si”Da 

Past  Honorific 

-  JabSu“Si”“Eoss”Da 

Future  Honorific 

-  JabSu“Si”“Gess”Da 

Polite 

-  JabSu“  Si ’’“GessSaO’ 

The  citation  form  of  “eat”  is  “MeogDa.”  Adding  “Si”  transforms  it  to  “JabSnSiDa.”  Adding 
“Boss”  on  top  of  the  honorific  form  makes  it  pa.st  honorific,  whereas  adding  “Gess,”  makes  it  future 
honorific.  Furthermore,  the  honorific  form  with  “GessSaO”  becomes  the  polite  form. 

2.3.2  Tense  Suffixes 

(1)  The  three  basic  tenses  are  past,  present,  and  future  tenses.  To  indicate  these  tenses,  “Eoss/Ass,” 
“n/Neun”  and  “Gess”  are  added,  respectively.  Again,  let  us  consider  the  word  “eat.” 


Past  -  Meog' ‘Eoss”Da 
Present  -  Meog“Neun”Da 
Future  -  Meog‘ ‘Gess’ ’Da 


(2)  These  tenses  may  be  superimposed  as  in  the  following  examples. 

(2-w)  JiGeumJjeumEun  MulGoGiReul  Jab‘ ‘Ass’ ’ ‘ ‘Gess’ ’Da 
BY  WOW  FISH  HAVE  CAUGHT 

(He)  must  have  caught  a  fish  by  now 

(2-x)  GeuDdaiWeun  MulGoGiReul  Jab‘ ‘Ass’ ’ ‘ ‘Eoss’ ’Da 
THAT  TIME  FISH  CAUGHT 

(I)  caught  a  fish  at  that  time 

(3)  “Gess”  is  used  to  mean  both  “shall”  and  “wiU.” 

2.3.3  Type-Defining  Suffixes 

(1)  Some  of  the  widely  used  types  of  sentences  in  Korean  include  statements,  exclamations,  inter- 
rogatives,  commands,  and  requesting  sentences.  These  are  completely  analogous  to  their  English 
counterparts.  Again,  let  us  use  the  word  “eat”  to  demonstrate  them. 
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(2-y)  AGiGa  BabEul  Meog‘ ‘NeunDa’ ’ 
BABY  MEAL  EATING 

The  baby  is  eating  a  meal 

(2-z)  AGiGa  BabEul  Meog“NeunGuNa” 
BABY  MEAL  EATING! 

The  baby  is  eating  a  meal! 

(2-aa)  AGiGa  BabEul  Meog“NeuNya” 
BABY  MEAL  EAT? 

Is  the  baby  eating  a  meal? 

(2-ab)  AGaYa  BabEul  Meog‘‘EoRa’’ 
BABY!  MEAL  EAT 

Eat  the  meal,  baby. 

(2-ac)  AGaYa  BabEul  Meog‘‘Ja’’ 

BABY!  MEAL  LET’S  EAT 
Let’s  eat  the  meal,  baby 


(2)  Roughly  speaking,  there  are  two  kinds  of  verbs.  One  kind  of  verb  is  called  DongSa;  it  describes 
the  movements  of  humans,  animals,  etc.  The  other  kind  of  verb  is  called  HyeongYongSa;  it  describes 
states  of  objects.  In  English,  the  latter  are  not  classified  as  verbs,  but  as  adjectives  with  the  verb 
“be.”  Words  such  as  “be  beautiful,”  “be  large,”  “be  hungry”  are  two-word  verbs  composed  of  the 
“be”  verb  and  an  adjective  in  English.  But  in  Korean,  they  are  simply  one-word  verbs. 

With  HyeongYongSa,  some  limitations  are  imposed  regarding  what  types  of  sentences  are 
possible.  HyeongYongSa  cannot  be  used  for  commands  and  requesting  sentences.  Furthermore, 
“ARa/EoRa”  are  used  to  make  exclamations  when  they  are  attached  to  HyeongYongSa,  whereas 
they  make  commands  when  attached  to  DongSa. 

2.3.4  Conjunctive  Suffixes 

(1)  Conjunctive  suffixes  that  enumerate  complementing  phrases  are  “Goj”  “Myeo,”  “MyeonSeo.” 

(2-ad)  JeonHoaHa‘ ‘MyeonSeo”  TVReul  BonDa 
TELEPHONE  TV  WATCH 

(I)  am  watching  TV  while  talking  on  the  phone 
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(2)  Those  that  enumerate  opposite  phrases  are  “GeoNa”/“GiNa,”  “DeunJi’7“DeonJi,”  “GeoNi,” 

“NeuNi.” 

(2-ae)  Ga‘‘DeunJi”  Mal“DeunJi’’  Ne  MaEumDaiRo  HaiRa 

GO  NOT  YOUR  MIND  PLEASES  DO 

You  decide  whether  to  go  or  not  to  go  as  you  desire 

(3)  There  are  conjunctive  suffixes  that  relate  the  preceding  and  the  following  phrases  in  specific 

ways. 

“Na,”  “JiMan”  are  used  to  mean  the  English  equivalent  “although.” 

(2-af)  Manhl  Jass‘ 'JiMan”  AJigDo  JolRiDa 

ALOT  SLEPT  STILL  SLEEPY 

Although  (I)  have  slept  alot,  I  am  still  sleepy 

“RyeoGo,”  “Ryeo”  are  equivalent  to  “in  order  to.” 

(2-ag)  IlJjig  Ggae' 'RyeoGo”  IlJjig  JassDa 
EARLY  GET  UP  EARLY  SLEPT 

(I)  went  to  bed  early  to  wake  up  esirly 

“NeuRaGo,”  “ASeo’7“EoSeo,”  “AYa”/“EoYa ,”  “GeoMan”/“GeoNiOa”  are  used  to  indicate 

that  the  preceding  phrase  is  the  cause  of  the  following  phrase. 

(2-ah)  BiGaW''aSeo”  USanI  PilYoHaissDa 
RAINING  UMBRELLA  NEEDED 

(I)  needed  an  umbrella  because  it  was  raining 

“nDe”  is  used  when  describing  the  background  that  will  be  used  for  the  following  phrase. 

(2-ai)  SimSimHa' ‘nDe”  MuEossEul  HalGga? 

BORED  WHAT  DO 

(I)  am  bored.  What  shall  (I)  do? 
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2.3.5  Gerund  Suffixes 

(1)  Suffixes  such  as  “m”  and  “Gi”  make  gerunds  out  of  verbs. 

(2-aj)  GeuReul  DdaRaHa‘ ‘m”Eun  HimDeulDa 
HIM  FOLLOWING  HARD 

It  is  hard  to  follow  doing  what  he  does 

(2-cik)  GuaJaReul  Meog‘‘Gi’’Ga  SilhDa. 

COOKIES  EATING  DISLIKE 

(I)  dislike  eating  the  cookies 

In  (2-aj)  the  gerund  serves  as  the  subject  of  the  sentence,  and  the  gerund  in  (2-ak)  serves  as 
the  direct  object  of  the  sentence. 

(2)  Very  frequently,  “n  Geos”  is  used  to  form  a  gerund.  This  form  has  an  identical  meaning  as 
the  cases  “m”  and  “Gi,”  but  provides  more  flexibihty  in  using  the  gerund.  This  form  is  used  more 
often  in  colloquial  language. 

(2-al)  GuaJaReul  Meog‘‘Neun’’  ‘‘Geos’ ’I  SilhDa. 

COOKIES  EATING  DISLIKE 

(I)  dislike  the  eating  the  cookies 

Some  of  the  distinctive  Korean  language  phenomena  have  been  explored  in  this  section.  The 
next  section  discusses  how  these  phenomena  could  be  implemented  in  a  language  generator  called 
GENESIS. 
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3.  GENESIS 


GENESIS  is  a  language  generator  that  produces  well-forined  sentences  from  a  semantic  rep¬ 
resentation.  GENESIS  takes  the  semantic  representation  of  a  source  language  text  as  the  input, 
and  generates  the  target  language  translation.  Before  discussing  the  mechanism  of  GENESIS,  it  is 
necessary  to  look  at  the  structure  of  its  input,  a  semantic  frame. 

3.1  Semantic  Frames 

The  meaning  representation  that  is  used  as  input  to  GENESIS  is  called  a  semantic  frame. 
The  semantic  frame  ideally  captures  the  meaning  of  the  speech  in  the  source  language  with  the 
hierarchical  dependencies  among  the  parts  of  the  speech  preserved.  The  semantic  frame  recognizes 
that  sentences  are  composed  of  clauses,  topics,  and  predicates  [7].  Note  that  “predicate”  includes 
adjectives  and  prepositional  phrases,  as  well  as  verbal  predicates.  See  the  semantic  frame  of  a 
sample  sentence  below.  The  corresponding  parse  tree  is  shown  in  Figure  4. 


Input :  Request  permission  to  defend  hilltop  echo 

Semantic  Frame  (CCL) 

{c  statement 
:mode  ‘‘fpl’’ 
member  ‘‘fpl” 

:pred  {p  v_request 

: topic  {q  pennission 

: complement  {p  fortify 

:aux  “to” 

:  topic  -[q  hilltop 
:pred  {p  initials 

: topic  ‘ ‘ echo ’ ’ I}}}}} 


The  structure  and  the  function  of  the  semantic  frame  will  be  discussed  in  more  detail  when 
discussing  how  GENESIS  uses  the  semantic  frame. 

3.2  Mechanism  of  GENESIS 

There  are  two  major  parts  to  GENESIS.  One  is  the  kernel  of  GENESIS,  which  does  not 
change  with  respect  to  the  target  language.  The  other  part  is  the  shell  of  GENESIS,  which  realizes 
the  output  sentences,  and  is  therefore  target-language-dependent  [7].  The  kernel  is  the  engine  of  the 
system  that  paraphrases  the  semantic  frame  and  generates  the  output  by  utilizing  the  information 
about  the  target  language  embedded  in  the  latter  part.  The  shell  specifies  the  characteristics  of 
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the  target  language  with  three  modules:  a  lexicon,  a  set  of  messages  and  a  set  of  rewrite  rules  [7]. 
The  mechanism  of  the  engine  will  be  implicitly  described  when  discussing  the  details  of  each  of  the 
modules.  Note  that  because  the  semantic  frame  is  encoded  in  Enghsh,  entries  in  the  lexicon  and 
the  set  of  messages  are  expressed  in  English.  This  by  no  means  implies  that  English  is  the  most 
proper  language  for  semantic  representation,  but  it  is  chosen  only  for  the  sake  of  convenience,  as 
most  people  using  the  system  can  understand  English. 

3.2.1  Lexicon 

The  lexicon  associates  each  semantic  frame  entry  with  its  corresponding  form  in  the  target 
language  [7].  This  mapping  takes  various  linguistic  phenomena  such  as  inflections  into  consideration 
[7].  Table  1  shows  an  example  lexicon  for  Enghsh.  Each  entry  in  the  lexicon  has  a  name  described 
by  the  part  of  speech  tag  [e.g.,  N  (Noun),  PREP  (Preposition),  V  (Verb)],  a  stem,  and  various 
derived  forms.  Part  of  speech  entries  specify  the  default  endings  to  the  entries,  the  morphological 
variants  of  which  are  regular.  For  example,  a  typical  noun  (N)  in  Enghsh  becomes  plural  when  an 
“s”  is  attached  to  the  end.  These  default  values  can  be  overridden  by  exphcit  lexical  entries,  as  in 
the  Enghsh  verbs  “be”  and  “do.” 


TABLE  1 

Example  Lexicon  Entries  for  English 


V 

V 

“Verb"  THIRD  “es”  ROOT  “e"  ING  "ing"... 

N 

N 

"NOUN”  PL  “s” 

be 

X 

“be”  ROOT  “be"  THIRD  “Is"  ING  “being"... 

do 

X 

"do”  THIRD  "does". ..MODE  "root"... 

will 

X 

...MODE  "future”... 

2  ' 

D 

"two”  CARDINAL  "second" 

Each  entry  can  have  its  own  grammatical  specifications  that  are  needed  to  produce  a  correct 
lexical  form  [7].  To  ihustrate  this,  consider  the  fact  that  the  Korean  langua^ge  has  two  different  ways 
of  reading  Arabic  numbers.  One  uses  native  Korean  and  the  other  Sino- Korean  (i.e.,  Korean  words 
of  Chinese  origin).  When  the  latter  is  used,  the  pronunciation  is  not  identical  to  the  pronunciation 
that  Chinese  people  use  today.  The  Chinese  reading  is  used  mostly  in  ordinary  usage  such  as 
mathematical  terms,  telephone  numbers,  or  room  numbers.  However,  for  counting  something  with 
an  order  (cardinal  numbers),  or  for  people’s  ages,  Korean  is  used.  Another  example  is  that  auxiliary 
verbs  set  the  mode  of  the  main  verb,  as  “wiU”  in  English  wiU  set  the  mode  of  the  main  verb  to  be 
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“root.”  Whether  a  particular  entry  is  to  be  treated  as  a  verb  or  as  an  adjective  is  controlled  in  the 
lexicon  and  can  be  language  dependent.  This  particular  feature  is  especially  relevant  for  Korean, 
as  many  adjectives  have  verb-like  properties  main  clauses. 

3.2.2  Messages 

Messages  are  grammar  templates  of  the  target  language  that  control  the  ordering  of  the  parts 
of  speech  [7].  The  topics,  predicates,  and  clauses  of  a  semantic  frame  get  transformed  into  phrases 
of  the  target  language  recursively  according  to  this  set  of  grammar  templates  [7].  A  typical  template 
consists  of  a  message  name  and  a  sequence  of  words  and/ or  keywords  that  describe  the  message. 
Words  can  be  inserted  before  or  after  any  of  the  keywords,  and  a  default  value  can  be  specified 
when  a  keyword  has  no  value. 

3.2.3  Rewrite  Rules 

The  rewrite  rules  are  intended  to  handle  the  hnguistic  phenomena  that  are  hard  to  deal 
with  through  the  mechanisms  of  lexicon  and  messages  [7].  The  typical  phenomena  are  phonotactic 
constraints  and  contractions.  For  example,  rewrite  rules  can  be  used  to  choose  the  correct  form 
of  the  indefinite  article  “a”  or  “an,”  or  to  merge  “a  other”  into  “another.”  The  flexibility  of  the 
rewrite  rules  is  not  limited  to  these  examples  and  will  be  explored  further  when  discussing  how 
they  are  used  in  the  case  of  Korean. 

3.3  GENESIS  for  Korean 

Appendices  A,  B,  and  C  contain  the  files  for  lexicon,  messages,  and  rewrite-rules,  respectively. 

3.3.1  Lexicon 

The  lexicon  has  nine  distinct  hnguistic  subcategories:  adjectives,  conjunctions,  auxihary 
verbs,  clauses,  determiners,  nouns,  pronouns,  adverbs,  and  verbs.  For  entries  in  each  of  these 
sub  categories,  a  fist  is  provided  that  enumerates  the  category  names  of  the  semantic  frame  along 
with  their  counterparts  in  the  target  language.  In  other  words,  this  lexicon  functions  as  if  it  were  a 
bihngual  lexicon  that  is  used  in  a  typical  transfer  translation  system  between  two  languages,  except 
for  the  fact  that  the  source  lexicon  is  derived  from  the  semantic  frame  rather  than  the  input  text. 

Many  Enghsh  words  are  lexically  ambiguous  in  the  sense  that  they_have  multiple  meanings. 
For  example,  the  adjective  “heavy”  could  mean  having  great  weight,  hard  to  bear,  serious,  profound, 
difiicult,  and  so  on.  In  Enghsh,  although  the  single  word  “heavy”  can  capture  all  these  different 
meanings,  each  of  these  meanings  must  be  mapped  to  different  semantic  frame  words  so  that  they 
can  be  distinguished  appropriately  in  the  target  language.  Unfortunately,  due  to  an  insufficient 
number  of  training  sentences,  only  a  smah  portion  of  such  meanings  having  the  same  Enghsh  words 
have  been  incorporated  into  the  lexicon.  In  cases  where  a  single  semantic  frame  word  has  more 
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than  one  meaning,  the  one  that  is  most  likely  to  be  used  in  the  military  context  has  been  chosen 
to  be  the  Korean  equivalent,  i.e.,  “DaeGyuMoEui”  meaning  “large-scaled”  has  been  chosen. 

It  is  possible  for  two  different  semantic  frame  adjectives  to  have  one  Korean  equivalent.  Unlike 
the  case  above,  this  does  not  create  much  trouble  as  the  precise  meaning  reveals  itself  from  the 
context  of  the  translated  Korean  sentences. 

The  Korean  language  does  not  have  articles  such  as  “the”  or  “a.”  Nonetheless,  occasions 
arise  when  one  needs  to  include  the  meaning  of  the  articles  explicitly.  “A”  can  be  translated  to 
“HaNaEui”  or  the  contracted  form  “Han.”  A  typical  Korean  speaker  would  use  the  contracted 
form  in  his/her  speech.  When  “HaNaEui”  is  used,  it  directly  describes  the  following  noun  with  the 
meaning  of  “one,”  as  in  “HaNaEui  Chaeg”  meaning  “one  book.”  “Han”  functions  a  bit  differently 
from  “HaNaEui.”  When  “Han”  is  used,  a  counting  noun  (often  called  ‘classifier’)  follows.  For 
example,  “one  book”  can  be  translated  to  “Chaeg  HanGueon.”  Here,  “Gueon”  is  the  counting 
noun,  designated  specifically  for  counting  the  number  of  books. 

One  unsolved  problem  with  adjectives  stems  from  the  fact  that  they  can  be  used  to  describe 
nouns  and  can  also  be  used  in  variation  with  the  “be”  verb  to  describe  a  state.  This  may  not 
cause  any  problems  if  the  target  language  uses  adjectives  in  the  same  way,  but  Korean  is  not  such 
a  language.  For  one  thing,  Korean  does  not  have  linking  or  auxiliary  verbs.  Before  suggesting  a 
possible  solution  to  the  above  problem,  it  is  necessary  to  describe  how  “be”  verbs  can  be  reflected 
in  Korean  verbs  and  the  kinds  of  verbs  found  in  the  Korean  language. 

“Be”  verbs  have  at  least  four  different  functions  in  English.  The  first  function  indicates  the 
existence  of  an  object,  as  in  “there  is  a  book  on  the  table.”  The  second  indicates  two  things  equal 
in  meaning.  “God  is  love”  is  such  an  example.  The  third  function  is  used  with  past  participles  of 
intransitive  verbs  as  an  auxiliary  verb.  Finally  the  last  one  is  used  with  adjectives  to  describe  the 
state  of  an  object.  Korean  handles  each  of  these  four  cases  differently.  This  handling  is  done  by 
manipulating  the  endings  of  related  verbs.  A  system  that  translates  from  an  English  system  into 
the  appropriate  “be”  auxiliary  for  Korean  would  have  to  tie  the  “be”  verb  in  the  English  sentence 
to  the  correct  “be”  inflections  for  the  related  verbs. 

There  are  two  kinds  of  verbs  in  Korean:  action  verbs  and  adjectival  verbs.  Action  verbs  behave 
just  like  their  counterparts  in  English.  They  simply  express  acts  and  occurrences.  Adjectival  verbs, 
however,  axe  verbs  that  describe  the  mode  of  being  and  are  equivalent  to  adjectives  with  “be”  verbs 
in  English.  In  other  words,  Korean  has  special  verb  endings  to  handle  the  first  three  functions  of 
“be”  verbs.  These  are  “IssDa,”  “ID a,”  and  “EossDa.”  These  verb  endings  cover  the  three  roles 
along  with  some  changes  within  the  roots  of  verbs.  However,  Korean  does  not  have  a  simple  verb 
describing  the  state  of  an  object  by  using  adjectives.  Instead,  it  has  adjectival  verbs.  In  other  words, 
a  phrase  like  “is  pretty”  is  considered  as  one  verb  in  Korean  and  can  be  translated  to  “GobDa.” 
These  adjectival  verbs  do  not  have  as  many  complex  verbal  endings  as  action  verb  (see  Section  2). 
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Conjunctions  do  not  pose  as  much  difficulty  as  adjectives.  However,  “and”  can  be  lexically 
ambiguous.  It  can  be  used  when  enumerating  things,  or  when  connecting  two  parallel  clauses. 
Korean  has  two  different  words  for  these,  “Oa,”  “Goa,”  or  “HaGo”  for  the  former,  and  “GeuRiGo” 
for  the  latter.  Again,  a  distinction  between  the  two  cases  is  needed.  Note  that  the  ambiguity  would 
not  blur  the  meaning  of  the  translation.  It  will  only  decrease  the  fluency  of  the  translation. 

Korean  does  not  have  specific  linking  verbs  or  auxiliary  verbs.  Each  verb  has  its  own  variety 
of  endings  that  carry  the  meaning  that  linking  or  auxiliary  verbs  are  designed  to  deliver.  As  a 
consequence,  these  verbs  of  the  semantic  frame  do  not  get  mapped  into  any  Korean  words.  They 
only  specify  the  mode  of  the  verb  in  order  to  specify  the  proper  ending  of  the  verb.  For  example, 
the  auxiliary  verb  “will”  sets  the  mode  of  the  corresponding  Korean  verb  to  be  “future.”  This 
“future”  mode  then  is  used  when  selecting  the  proper  mode  of  the  verb  later  on.  This  selection 
process  wUl  be  discussed  later  in  this  section. 

Clause-level  semantic  frames  can  also  set  the  mode.  (See  the  entries  for  “command  1”  and 
“command2”  in  Appendix  A.)  These  do  not  share  any  similarities  with  auxiliary  verbs,  but  nonethe¬ 
less,  are  useful  signals  when  selecting  proper  Korean  verb  endings.  Note  that  the  command  has 
two  different  modes.  “Command  1”  refers  to  imperative  sentences  as  in  “to  direct  authoritatively.” 
“Command2”  is  used  for  suggesting  sentences  as  in  “let’s  do...” 

Most  determiners  can  be  directly  mapped  with  equivalent  Korean  words  without  much  lexical 
ambiguity.  Numbers  are  included  in  this  category.  In  Korean,  as  well  as  English,  there  is  a  dis¬ 
tinction  between  counting  numbers  and  cardinal  numbers.  The  biggest  difference  is  that  Chinese 
pronunciation  is  used  for  counting  numbers  and  native  Korean  pronunciation  is  used  for  cardinal 
numbers.  For  most  items  that  need  numbering,  including  mathematics,  Chinese  pronunciation  is 
used.  The  exceptions  are  cardinal  numbers  and  ages  of  people,  in  which  native  Korean  pronuncia¬ 
tion  is  used.  A  syllable  “Jjae”  is  attached  to  form  cardinal  numbers;  this  “Jjae”  is  similar  in  role 
to  “  th”  in  English. 

Nouns  also  have  lexical  ambiguity,  just  as  adjectives  do.  Some  nouns,  such  as  “eagle”  or 
“east”  have  straightforward  equivalents  in  Korean,  but  most  nouns  do  not.  When  choosing  the 
mapping  words  among  many  possible  choices,  the  ones  that  would  most  likely  to  be  used  in  a 
military  context  are  chosen.  One  such  example  would  be  the  word  for  “terrain.”  There  are  at 
least  three  Korean  translations  for  this  word:  “Ji  Hyeong,”  “JiSe,”  and  “JiYeog.”  Among  these 
translations  “JiYeog”  has  been  chosen  as  it  seemed  to  be  the  choice  that  would  most  likely  to  be 
used  in  a  military  context.  When  a  semantic  frame  word  has  multiple  translations  and  is  not  a 
military  term,  the  translation  that  would  most  likely  to  be  used  by  educated  civilians  in  daily  life 
has  been  chosen. 

Pronouns  can  be  straightforwardly  mapped.  Each  semantic  frame  pronoun  has  a  Korean 
equivalent  and  one  piece  of  additional  information  that  indicates  what  is  called  NUM.  NUM 
tells  whether  the  pronoun  is  first  person,  second  person,  or  third  person. 
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The  current  lexicon  contains  only  three  adverbs  that  have  a  simple  mapping. 

Sohn  categorizes  Korean  verbs  into  eight  distinct  groups  distributed  in  three  broad  classes: 
four  kinds  of  regular-consonant-final  group,  three  kinds  of  irregular-consonant-final  group,  and 
one  kind  of  vowel-final  group  [8].  This  classification  was  the  one  that  the  initial  system’s  verb 
classification  was  based  on,  but  it  later  proved  inadequate.  Although  Sohn’s  approach  might  be 
linguistically  exhaustive,  it  omitted  quite  a  few  classes  of  verbs  and  oversimplified  the  classification 
to  be  used  for  this  project.  The  current  system  set  up  uses  a  modified  version  of  Sohn’s  classification. 

The  classification  is  based  on  how  verbal  endings  change  when  the  verbs  are  used  in  different 
kinds  of  sentences:  present  tense  sentences  (first  person  singular,  second  person  singular,  third  per¬ 
son  singular,  first  plural,  second  plural),  present  continuing  tense  sentences,  future  tense  sentences, 
command  sentences,  requesting  (“let’s  do...”)  sentences,  ca^e  clauses,  and  infinitive  phrases. 

These  cases  are  certainly  not  exhaustive  and  are  even  redundant  at  the  same  time.  For 
example,  interrogative  sentences  or  exclamation  sentences  are  not  considered.  Also,  Korean  does 
not  distinguish  among  first,  second,  or  third  person.  Furthermore,  singular  and  plural  sentences  use 
the  same  verbal  endings.  In  other  words,  Korean  verbs  do  not  utilize  all  the  features  that  English 
verbs  do.  This,  however,  does  not  mean  that  Korean  language  generation  is  simple.  Korean 
verbs  have  many  linguistic  features  that  English  verbs  do  not  have  and  this  presents  the  most 
difficult  problem  in  Korean  language  generation  from  interlingua.  Before  discussing  this  problem, 
the  features  of  verb  classification  and  its  structure  are  discussed  below. 

The  most  noticeable  difference  between  the  modified  classification  and  the  original  one  is  the 
addition  of  “HaDa”  verbs.  Sohn’s  approach  does  not  consider  these  as  verbs.  Nevertheless,  they 
constitute  the  majority  of  aU  the  verbs  in  Korean.  “HaDa”  means  “do”  in  English,  and  always 
follows  a  noun.  Hence,  a  noun  with  “HaDa”  attached  to  it  becomes  a  verb,  meaning  “to  do  the 
action  denoted  by  the  noun.”  One  of  the  typical  examples  would  be  “JeonHoaHaDa.”  Here, 
“JeonHoa”  means  telephone  in  English.  Therefore,  “JeonHoaHaDa”  means  “to  call.”  This  verb 
would  have  the  basic  form  “JeonHoa.”  When  the  usage  of  this  verb  has  been  decided,  one  of 
the  possible  11  endings  would  be  suffixed  to  it.  The  possible  endings  are:  “HaGiReul,”  “HaGo 
IssDa,”  “HaRa,”  “HaJa,”  “Hal  GeosIDa,”  “Hal  Ddae,”  “HanDa,”  “HanDa,”  “HanDa,”  “HanDa,” 
“HanDa.”  These  endings  correspond  to  root  form,  present  continuing,  command,  request,  future, 
case  clause,  first  singular,  second  singular,  third  singular,  second  plural,  and  first  plural  usages  in 
sentences,  respectively. 

To  see  how  the  mechanism  works,  consider  a  semantic  frame  for  the  sentence  “Call  me.” 
This  sentence  would  be  recognized  as  a  command.  The  system  looks  up  the  Korean  verb  mapped 
to  “call,”  and  finds  “JeonHoa.”  Because  the  verb  is  categorized  as  a  “HaDa”  verb,  and  because 
the  sentence  is  recognized  as  a  command  sentence,  the  system  searches  the  ending  for  command 
in  “HaDa”  verbal  endings,  and  finds  “HaRa.”  Then  the  basic  form  “JeonHoa”  is  combined  with 
“HaRa”  to  make  “JeonHoaHaRa.” 
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The  regular-consonant-final  verbs  could  be  grouped  into  two  classes.  The  final  consonants 
of  these  verbs  are  “S,”  “D,”  “B,”  and  “T.”  Although  these  verbs  are  classified  by  Sohn  to  be 
linguistically  regular,  they  have  not  been  found  to  have  any  apparent  relationship  with  the  verbal 
endings.  For  example,  the  words  “MudDa”  and  “BadDa,”  meaning  “to  bury”  and  “to  receive”  are 
both  D-ending  regular-consonant-final  verbs,  but  they  belong  to  two  different  classes  in  the  current 
system. 

The  sole  difference  between  the  two  classes  arises  from  the  ways  that  command  sentences  are 
treated.  The  first  class  has  an  “EoRa”  ending  whereas  the  second  class  has  an  “ARa”  ending.  It 
should  be  noted  that  these  two  classes  could  be  merged  to  form  one  class  by  using  “EuRa”  in  place 
of  “EoRa”  and  “ARa.”  In  normal  Korean  speech,  “EoRa”  and  “ARa”  are  almost  exclusively  used 
to  make  command  sentences  with  regular-consonant-final  verbs.  The  only  time  “EuRa”  is  used  is 
when  discussing  the  Korean  language  in  a  linguistics  context  or  by  a  minority  of  military  personnel. 
The  “EuRa”  is  a  very  authoritative  and  demanding  form  of  command.  It  sounds  peculiar  in  modern 
Korean  speech.  Furthermore,  using  “EoRa”  and  “ARa”  instead  of  EuRa  would  not  invoke  any 
confusion  in  any  imaginable  context.  For  these  reasons,  “EoRa  and  ARa  were  chosen,  producing 
two  classes. 

Unlike  the  regular-consonant-final  verbs,  the  three  irregular-consonant-final  verbs  had  to  have 
individual  classifications.  The  apparent  difference  between  the  classes  for  the  regular-consonant- 
final  verbs  and  the  classes  for  irregular-consonant-final  verbs  is  that  some  endings  of  the  latter  have 
“S,”  “D,”  and  “B”  consonants  in  the  front.  Adding  these  consonants  was  needed  because  of  the 
way  the 'corresponding  verbs  are  written.  For  example,  “to  draw”  in  Korean  is  “GeusDa,”  where 
“Cues”  is  the  stem  of  the  verb.  The  future  form  of  this  verb  is  “GeuEul  GeosIDa.”  Notice  that  the 
“S”  in  the  stem  has  been  omitted.  For  this  reason,  the  stem  is  represented  by  “Geu.”  Where  the 
ending  requires  that  “S”  be  in  the  stem,  the  ending  has  its  own  “S”  in  the  front,  like  the  present 
continuous  form  “S  Go  IssDa.”  (See  Appendix  A  to  see  the  various  endings  of  these  verbs.) 

The  vowel-final  verbs  have  only  one  class. 

3.3.2  Messages 

As  indicated  in  Section  2,  the  basic  Korean  grammar  is  very  different  from  the  English 
grammar.  The  order  of  words  in  a  sentence,  the  usage  of  postpositions  rather  than  prepositions, 
and  various  verbal  endings  are  the  three  most  pronounced  features  among  the  linguistic  phenomena 
of  the  Korean  language.  The  messages  file  captures  the  particular  features  of  the  first  two  linguistic 
phenomena. 

Consider  an  English  sentence  “I  am  going  to  school  now.”  This  sentence  would  be  translated 
to  “I  now  school  to  going  am”  when  following  the  ordering  of  Korean  with  English  words.  In 
colloquial  Korean,  however,  the  same  sentence  would  be  translated  to  “I  now  school  go.  Notice 
that  the  word  ordering  is  totally  different  from  that  of  English  and  that  the  postposition  has  been 
dropped  in  the  colloquial  style.  This  omission  does  not  distort  or  misconvey  the  intended  meaning 
of  the  sentence  under  normal  circumstances,  as  the  speaker  and  the  listener  generally  know  the 
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topic  of  the  conversation,  and  phenomena  that  come  with  speech,  such  as  intonation,  help  clarify 
potential  confusion  arising  from  the  omissions.  For  this  reason,  the  current  set  up  of  messages 
uses  postpositions  whenever  possible  in  order  to  reduce  the  likelihood  of  ambiguities.  This  setting, 
however,  is  to  be  modified  in  the  future,  as  it  leads  to  rather  stilted  speech. 

The  order  is  recursively  formed  as  topics,  predicates,  and  clauses  of  a  semantic  frame  that 
transformed  into  phrases  of  the  target  language  according  to  the  grammar  templates.  Each  template 
consists  of  a  message,  and  a  sequence  of  words  and  keywords  that  describe  the  message.  Messages 
are  semantic  frame  words  that  are  to  be  translated  to  their  corresponding  target  language  words. 
Keywords  are  the  names  of  categories  to  which  a  group  of  words  with  common  Unguistic  aspects 
belong.  OBJECTJPRONOUN,  for  example,  is  all  the  pronoun  words  in  the  system  that  can  be 
used  as  objectives.  In  the  messages  file,  the  messages  are  the  words  in  the  left-most  column  in 
lowercase  letters.  Each  message  has  its  describing  words  and  keywords  in  its  row.  An  example  of 
a  message  will  help  illustrate  how  word  order  is  decided. 

Consider  the  semantic  frame  word  “pass.”  When  this  word  is  transformed  to  the  target 
language,  its  describing  keywords  state  that  the  words  that  comprise  a  phrase  with  “pass”  wiU  follow 
the  order  of  OBJECT-NOUN,  ADV.WHEN,  TOPIC,  ADV-DEGREE,  ADVAIAIN,  ADV.SOLE, 
and  PREDICATE,  with  the  PREDICATE  being  “pass.”  Simply  put,  the  order  of  the  keywords 
of  each  message  decides  the  order  of  target  language  words  associated  with  the  message.  As  a 
semantic  sentence  gets  translated  into  the  target  language,  each  word  in  the  semantic  sentence  is 
examined  at  least  once.  This  ensures  that  the  final  output  will  have  the  correct  order  as  specified 
by  the  messages  involved. 

To  see  how  each  of  these  messages  contributes  when  a  semantic  frame  gets  transformed  recur¬ 
sively  into  the  target  language,  consider  the  English  input  sentence  “CALL  SOME  ARTILLERY.” 
The  sentence  is  identified  to  be  of  type  commandl.  Under  this  message,  the  listed  keywords  in 
order  are  OPENING,  IDl,  TOPIC,  PREDICATE,  ID2,  and  CLOSING.  Among  these  keywords, 
the  only  one  that  is  relevant  to  the  sentence  is  PREDICATE  as  the  sentence  does  not  have  opening 
words,  identification  words,  topics,  or  closing  words.  Therefore,  the  entire  sentence  is  a  predi¬ 
cate  of  type  commandl.  The  first  word  of  the  predicate  is  “call.”  Under  the  message  “call,”  the 
listed  keywords  are  OBJECT-PRONOUN,  TOPIC,  ADVJDEGREE,  ADV_MAIN,  ADV_SOLE, 
and  PREDICATE.  Now,  the  relevant  keywords  are  topic  and  predicate,  the  topic  being  “some 
artillery”  and  the  predicate  being  “call.”  Because  TOPIC  comes  before  PREDICATE,  the  Korean 
words  for  “some  artillery”  get  put  before  the  Korean  word  for  “call.”  Finally,  the  topic  “some 
artillery”  gets  further  analyzed  for  correct  order.  Note  that  the  keyword  TOPIC  for  “call”  has 
“Eul”  following  it,  indicating  that  the  topic  is  the  object  of  the  sentence,  the  predicate  of  which  is 
“call.”  This  gets  attached  right  after  the  Korean  phrase  for  “some  artillery.”  Therefore,  the  final 
output  becomes  “some  artiUery“Eul”  call”  expressed  in  English  words  in  the  Korean  order. 

Notice  that  there  is  “np-caU”  below  the  “caU”  message,  “np”  indicates  that  “caQ”  is  used 
not  as  a  main  predicate,  but  rather  as  a  predicate  modifying  a  noun  phrase.  Because  the  example 
given  uses  “call”  as  its  main  verb,  “call”  has  been  used  instead  of  “np-call.” 
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Notice  that  there  are  some  lower  casewords  inserted  between  the  keywords  such  as  Eun, 
GeunCheoE,  or  Eul.  Most  of  these  are  postpositions.  Unlike  the  keywords  that  are  written  in 
upper  case  letters  and  have  in  front,  these  words  are  not  Unguistic  categories,  but  simply  words 
that  later  wiU  appear  as  they  are  written.  They  don’t  always,  however,  appear.  Only  when  the 
keywords  that  they  follow  have  nonempty  values  do  they  take  any  values  and  appear  as  they  are 

written. 

One  class  of  the  Korean  postpositions  is  used  to  indicate  that  what  precedes  is  a  subject,  as 
discussed  in  detail  in  Section  2:  Eun,  Neun,  I,  and  Ga.  In  brief  summary,  two  of  them  have  the 
same  meaning,  but  are  used  differently  depending  on  the  ending  of  the  subject.  If  the  subject  ends 
with  a  vowel,  the  postposition  is  “Neun.”  If  the  subject  ends  with  a  consonant,  the  postposition  is 
“Eun.”  The  messages  file  has  Eun  by  default.  When  the  subject  is  found  to  end  with  a  vowel,  then 
Eun  is  replaced  by  Neun.  This  finding  and  replacing  is  done  by  Rewrite  rules  that  wiU  be  discussed 
in  the  next  section.  The  other  two  postpositions  are  “I”  and  “Ga”  with  the  same  meaning.  The 
difference  between  these  two  and  the  two  above  is  that  these  two  are  used  for  nouns  that  have 
definite  particle  in  EngUsh.  This  subclass  is  not  used  in  the  current  system  setup  because  the 
parse  tree  decoder  currently  ignores  the  difference  between  nouns  with  articles  and  nouns  without 

articles. 

Another  class  of  postpositions  indicates  that  what  precedes  is  an  object.  This  class  can  be 
divided  into  two  subclasses:  one  for  direct  objects,  and  the  other  for  indirect  objects.  For  now, 
the  assumption  is  made  that  all  objects  are  direct  objects.  This  assumption  has  not  caused  any 
problems,  because  the  training  sentences  do  not  contain  any  indirect  objects.  Furthermore,  this 
assumption  simplifies  the  system  setup  such  that  there  only  needs  to  be  two  prepositions  for  this 
class:  “Eul”  and  “Reul.”  “Eul”  is  used  when  the  ending  of  the  preceding  object  ends  with  a 
consonant,  and  “Reul”  is  used  when  the  object  ends  with  a  vowel.  The  default  is  “Eul”;  just  as  in 
the  case  of  “Neun”  and  “Eun,”  rewrite  rules  replace  this  with  “Reul”  when  necessary. 

Semantic  frame  prepositions  such  as  “at,”  “of,”  “to,”  “near,”  “from”  proved  to  be  very 
troublesome  because  they  have  many  different  meanings  and  possible  translations.  Consider  two 
EngUsh  sentences  that  use  “at”:  “The  plane  arrived  at  10  AM”  and  “He  pointed  at  me.”  “At” 
means  “E”  in  the  first  sentence  and  “EGe”  or  “Reul”  in  the  second  sentence  in  Korean.  TINA  and 
GENESIS  have  the  capabiUty  to  assign  different  roles  for  prepositions,  however,  depending  upon 
the  meaning  of  the  associated  noun  phrase  [9].  This  allows  having  semantically  specific  prepositions 
in  the  lexicon  that  know  precisely  which  form  they  should  translate  to.  The  current  system  does 
not  yet  fuUy  exploit  this  feature  and  chooses  to  use  the  most  general  translations  instead.  This 

scheme  wiU  soon  be  changed. 

3.3.3  Rewrite  Rules 

There  are  two  columns  in  rewrite  rules.  The  first  column  is  a  Ust  of  characters  that  is  searched 
after,  and  the  second  column  is  a  Ust  of  characters  that  wiU  replace  the  element  m  the  first  column 
once ’it  has  been  found.  There  are  three  subsections  to  complete  the  task. 
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The  simplest  section  deals  with  postpositions  such  as  “GgaJi,”  “ESeo,”  and  “GeunCheoE.” 
These  postpositions  were  used  in  the  messages  table  as  translations  for  “up  to,”  “at,”  and  “near,” 
respectively.  Rewrite  rules  replace  these  postpositions  in  English  characters  with  those  in  Korean 
characters. 

The  second  section  completes  the  verbal  classifications  of  the  lexicon.  For  example,  category 
V4  has  “S  go  IssDa”  as  the  ending  for  present  continuous  form.  The  “S”  is  supposed  to  be  the 
ending  of  the  root.  When  running  GENESIS,  however,“S”  is  not  recognized  as  the  ending,  but  as  a 
stand-alone  consonant  not  attached  to  any  word.  This  section  fixes  this  problem  by  eliminating  the 
space  between  the  root  and  “S.”  Because  there  are  numerous  combinations  of  this  sort,  a  simple 
program  has  been  written  to  automatically  generate  such  combinations  with  a  small  set  of  short 
tables  as  its  input.  This  program  also  automates  the  last  section. 

The  last  section  completes  the  postpositions  proposed  in  the  lexicon.  As  explained,  “Eun”  is 
set  to  be  the  default  postposition  for  indicating  subjects.  This  is  correct  only  when  the  ending  of 
the  subject  is  a  consonant.  When  it  is  not  so,  this  section  changes  “Eun”  to  “Neun.” 

These  rewrite  rules  were  found  to  be  very  long  and  largely  patterned  so  that  a  program  could 
be  written  to  automatically  generate  the  rules.  (See  Appendix  C.)  The  program  uses  three  input 
data  files:  “first-consonants,”  “all-vowels,”  and  “final-consonants.”  “First-consonants”  contain  all 
the  consonants  that  can  come  in  the  beginning  of  a  Korean  syllable.  “All-vowels”  lists  all  Korean 
vowels,  and  “final-consonants”  lists  only  the  relevant  consonants  for  the  rewrite  rule  generation.  The 
program  first,  generates  all  the  permutations  of  the  three  files.  These  permutations  are  written  in 
romanized  Korean  characters.  Some  of  these  permutations  are  not  used  in  Korean  at  all.  To  sift  out 
those  that  cannot  be  used,  a  program  is  used  to  convert  the  romanized  Korean  into  Korean.  During 
this  conversion  process,  the  impossible  outcomes  are  represented  by  blanks.  Then  this  rough  list 
of  Korean  syllables  gets  converted  back  to  romanized  Korean.  The  blanks  are  removed,  producing 
a  clean  chart  of  rewrite  rules.  Finally,  this  clean  chart  gets  converted  to  Korean.  Rrdes  computed 
in  this  way  get  combined  with  a  list  of  rules  that  specify  special  cases,  ultimately  generating  the 
korean-rewrite-rules  text  file. 

While  producing  the  three  GENESIS  tables  necessary  for  Korean  generation,  it  has  been  found 
that  GENESIS  has  some  deficiencies  for  Korean  generation.  These  deficiencies  stem  from  either 
the  inherent  linguistic  nature  of  Korean  or  the  fact  that  GENESIS  is  stiU  an  evolving  task.  The 
following  section  gives  suggestions  to  improve  GENESIS  to  accommodate  some  of  the  deficiencies. 
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4.  PROPOSED  IMPROVEMENTS  TO  GENESIS 


This  section  discusses  the  Korean  language  phenomena  that  are  not  currently  being  handled 
adequately  by  TINA  and/or  GENESIS  are  discussed.  Note  that  there  are  two  reasons  for  this. 
One  reason  is  that  TINA  and  GENESIS  are  stiU  evolving  and  improving,  implying  that  what 
cannot  be  handled  at  this  point  may  not  be  due  to  inadequacies  of  TINA  or  GENESIS,  but  may 
simply  be  due  to  the  lack  of  necessary  mechanisms  that  have  not  been  implemented  yet.  Handling 
negations  and  passive  voice  sentences  are  examples.  Also,  Korean  is  so  dilferent  from  English  that 
linguistic  phenomena  occurring  in  English  simply  cannot  be  represented  in  Korean  and  vice  versa. 
Translating  prepositions  to  postpositions  illustrates  this  point.  For  each  of  the  following  cases,  a 
suggestion  for  implementation  to  solve  or  reduce  the  translation  problem  is  given. 

4.1  Negations  and  Passive  Voice  Sentences 

The  current  generation  system  handles  only  a  subset  of  aU  the  possible  kinds  of  Korean 
sentences,  i.e.,  it  does  not  have  a  mechanism  to  handle  negative  sentences  and  it  restores  passive 
voiced  sentences  to  active  voice.  This  is  a  byproduct  of  the  choice  of  training  data,  which  is  a 
transcription  of  Task  Force  Command  Net  exercise  control  messages,  chosen  to  train  Common 
Coalition  language  at  LIN Coin  Laboratory  (CCLINC)  to  suit  brigade  communications  [4].  Because 
the  control  messages  are  usually  expressed  in  positive  and  active  voice,  the  parser  does  not  have  to 
be  concerned  with  analyzing  negation  or  passive  voice  sentences,  hence  the  current  lack  of  such  a 
mechanism  in  the  generation  system  [10]. 

Once  the  grammar  can  analyze  these  kinds  of  sentences,  the  modifications  needed  for  the 
generation  system  would  be  quite  simple  because  negation  and  passive  voice  are  aU  reflected  and 
handled  solely  by  postpositions  and  verb  endings  [6].  A  sentence  could  be  negated  only  by  changing 
the  ending  of  the  main  verb;  an  active  voice  sentence  could  be  transformed  into  the  passive  voice 
by  replacing  the  postpositions  for  the  subject  and  the  object  and  also  by  changing  the  ending  of 
the  main  verb.  Examples  foUow;  notice  that  (1-d)  is  a  passive  voice  negated  sentence. 


(1-a)  GoYanglGa  JuiReul  JabAssDa  -  positive  and  active 
CAT  MOUSE  CAUGHT 


(1-b)  GoYanglGa  JuiReul  JabJi  MosHaissDa  -  negative  and  active 
CAT  MOUSE  CATCH  DID  WOT 

(1-c)  JuiGa  GoYanglEGe  JabHyeossDa  -  positive  and  passive 

MOUSE  CAT  CAUGHT 


(1-d)  JuiGa  GoYanglEGe 
MOUSE  CAT 


JabHiJi  AnbAssDa  -  negative  and  passive 
CATCH  DID  NOT 
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Because  GENESIS  is  table-driven,  the  necessary  modifications  can  be  implemented  quite 
easily.  The  messages  file  that  need  to  have  a  message  handles  passive  voiced  sentences,  and  the 
lexicon  file  would  need  to  have  extended  verbal  classifications  to  accommodate  negations  and  passive 
voices.  Note  that  having  negated  sentences  may  not  necessitate  making  a  new  message  in  the 
messages  file  as  the  only  deviation  from  the  statement  message. 

A  possible  approach  that  can  be  taken  to  incorporate  negations  and  passive  voice  would  be  to 
use  the  same  method  as  the  one  used  for  the  auxiliary  verb  “WILL”  in  the  lexicon.  By  setting  the 
mode  for  negations  and  passive  voice,  it  would  become  possible  to  introduce  new  verbal  inflections 
for  each  of  eight  verbal  categories  to  generate  the  correct  verbal  endings.  For  example,  the  first 
category  can  have  a  new  mode  “PASSIVE,”  followed  by  “DoiDa”  to  take  care  of  the  passive  voice 
sentences.  Careful  attention  will  be  needed,  however,  when  this  passive  voice  is  accompanied  by 
another  mode  such  as  “WILL.”  In  that  case,  a  mechanism  that  wiU  take  multiple  modes  will  be 
necessary.  Similar  arguments  apply  for  negations.  This  approach  can  be  extended  to  cover  the 
enormous  number  of  inflection  endings  in  Korean  as  follows. 

Korean  verb  endings  usually  have  more  than  one  inflection.  Inflections  include  passive,  hon¬ 
orific,  sentence  marker,  etc.  Each  of  these  inflection  modes  has  several  variations,  and  the  proper 
inflection  is  chosen  based  both  on  the  verb  stems  and  the  two  preceding  syllables  [11].  The  inflec¬ 
tions  also  occur  in  a  fixed  order  as  follows  [11]. 

Verb  stem  +  Passive  +  Honorific  +  Negative  +  Tense  +  Sentence  marker 

With  the  exception  of  tense  and  sentence  marker  in  a  main  clause,  the  occurrence  of  inflections 
is  optional.  Each  inflection  mode  contains  more  than  one  variation.  Some  of  the  inflections  that 
often  occur  are  listed. 

1.  Passive  -  “Doi,”  “I,”  “Hi,”  “Gi” 

2.  Honorific  -  “Si,”  “EuSi” 

3.  Negative  -  “Anh,  “JiAnh” 

4.  Tense  (Past)  -  “Ass,”  “Eoss,”  “ss” 

5.  Tense  (Present)  -  “Eun,”  “n” 

6.  Tense  (Future)  -  “Gess” 

7.  Sentence  marker  (Declarative)  -  “Da” 

8.  Sentence  marker  (Interogative)  -  “Ni” 

9.  Sentence  marker  (Authoritative)  -  “Ha” 

The  inflection  modes  and  the  inflection  variations  listed  above  are  not  exhaustive.  However, 
for  the  purpose  of  battle  management,  they  are  sufficient  to  generate  possibly  fluent  and  adequate 
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verbal  endiags.  The  missing  modes  or  inflection  variations  are  either  extraneous  for  our  purpose  or 
their  status  is  still  controversial  [11]. 

If  GENESIS  had  a  capability  of  defining  a  “:VEIlB_MODE”  line  in  the  messages  file  that 
orders  the  various  modes,  and  a  set  of  modal  settings  specified  for  different  kinds  of  verbs  in  the 
lexicon  file  along  with  some  code  modification  to  attach  all  those  modal  endings  to  the  verb  stem, 
then  the  following  idea  is  proposed  to  handle  the  complexity  of  Korean  verb  endings. 

Let  us  look  at  Figure  5.  Figure  5  depicts  what  inflections  “HaDa”  verbs  take  and  how  they 
should  be  composed  to  form  a  complete  ending.  Each  arrow  indicates  what  inflections  can  follow 
a  particular  inflection.  To  filustrate  the  flow,  let  us  examine  the  word  “SiJagHaDa,”  which  means 
“begin”  in  English,  under  the  following  circumstances; 

1.  “Begin”  with  Past  +  Declarative 

2.  “Begin”  with  Honorific  +  Present  +  Interogative 

“Begin”  with  past  tense  and  declarative  sentence  marker  comes  after  the  following  scheme. 
The  arrow  flow  is  marked  with  A1  and  A2.  The  arrows  begin  with  the  verb  stem  “SiJag.”  Then  it 
is  attached  with  “Haiss”  to  form  the  past  tense  inflection.  Finally,  “SiJagHaiss”  becomes  combined 
with  “Da”  to  form  the  complete  verb  representing  “begin”  with  past  tense  and  declarative  sentence 
marker.  Note  that  for  this  process  to  work,  GENESIS  has  to  be  able  to  (1)  recognize  “SiJag”  to 
be  a  “HaDa”  verb,  (2)  skip  the  passive,  honorific,  and  negative  modals,  (3)  recognize  “Haiss”  to 
be  the  correct  tense  inflection  representing  past  (4)  “Da”  is  the  sentence  marker  for  declarative 
sentences,  and  (5)  to  combine  them  in  the  correct  order. 

“Begin”  with  honorific  inflection,  present  tense,  along  with  interrogative  sentence  marker 
follows  a  similar  procedure  as  above,  although  it  is  a  bit  more  complicated.  Arrows  Bl,  B2,  and  B3 
indicate  the  flow.  These  arrows,  as  explained,  indicate  what  inflections  to  attach.  In  this  case,  they 
would  be  “HaSi,”  “n,”  “-n  +b  NiGga.”  Note  that  “n”  is  needed  to  form  present  declarative  verb 
ending.  However,  it  is  necessary  to  eliminate  this  consonant  and  add  “b”  to  the  second  syllable 
of  “HaSi.”  The  final  form  is  “SiJagHaSibNiGga”  (and  the  question  mark  in  the  end).  As  can  be 
seen,  GENESIS  needs  to  be  able  to  recognize  what  -n  and  +b  mean  in  addition  to  the  necessary 
capabilities  previously  mentioned. 

The  two  instances  previously  discussed  illustrate  the  mechanism  of  the  figure  and  the  neces¬ 
sary  capabilities  that  are  needed  to  be  implemented  in  GENESIS.  More  or  less  the  verb  inflections 
obey  the  same  procedure  described  above.  Note  that  “HaDa”  verbs  belong  to  the  same  verb  group 
V  as  defined  in  the  lexicon  file  (see  Appendix  A);  there  are  some  exceptions.  Both  “GuHaDa”  and 
“UeonHaDa”  do  not  exactly  follow  the  pattern  depicted  in  Figure  5.  The  part  that  they  do  not 
obey  is  passive  inflections;  they  obey  the  rest.  A  new  verb  classification  is  necessary  for  this  reason. 
Appendix  D  shows  five  different  sets  of  inflection  patterns.  Even  though  these  five  sets  cover  a 
subset  of  the  verbs  that  the  current  lexicon  file  contains,  they  will  serve  as  a  good  starting  point  of 
generating  inflection  patterns  that  would  cover  the  entire  spectrum  of  Korean  verbs. 
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4.2  Articles 


The  current  parser  does  not  exploit  its  ability  to  analyze  articles  [10],  but  even  if  it  did,  articles 
would  not  have  correct  mapping  to  Korean  because  Korean  does  not  have  exact  counterparts  to 
English  definite/indefinite  articles.  StiU,  if  desired,  definite  articles  can  be  encoded  by  the  Korean 
demonstratives  like  “I,”  “Geu,”  or  “Jeo.”  Furthermore,  indefinite  articles  can  also  be  encoded 
by  “HaNaEui.”  Even  though  these  demonstratives  can  partially  capture  the  meaning  of  English 
articles,  and  therefore  would  carry  more  meaning,  the  resulting  'translations  would  sound  quite 
awkward.  Translating  articles  among  dilferent  languages  is  difficult  because  they  do  not  obey  the 
same  linguistic  rules. 

4.3  Styles  of  Speech 

What  distinguishes  Korean  from  aU  other  languages  is  its  versatility  for  expressing  the  relative 
positions  of  the  listener  and  speaker.  This  includes  their  ranks,  ages,  genders,  and  so  forth.  Even 
though  these  different  styles  are  commonly  classified  by  linguistic  terms  such  as  honorific,  polite 
styles,  the  variety  of  such  styles  are  so  great  that  a  limited  number  of  simple  linguistic  terms  is 
simply  not  adequate.  For  an  illustration  of  the  variety  consider  the  following  example. 


(1-e) 

JeoNeun  HagGyoE  GabNiDa 
I  SCHOOL  GO 

-  an  educated  child  speaking  to 
an  elderly 

(1-f) 

JeoHeun  HagGyoE  GaJiYo 

“  less  formal  than  (1-e) 

(1-g) 

JeoNeun  HagGyoE  GaYo 

-  less  polite  than  (1-e) 

(1-h) 

Jeo  HagGyoE  GaYo 

-  less  formal  than  (1-g) 

(1-i) 

NaNeun  HagGyoE  GaYo 

-  a  child  speaking  to  an  older  person 

(1-j) 

Na  HagGyoE  GaYo 

-  less  formal  than  (1-i) 

(1-k) 

Na  HagGyoE  GanDa 

-  a  friend  speaking  to  a  friend 

(1-1) 

Ma  HagGyoE  Ga 

-  same  as  (1-k) 

(1-m) 

Na  HagGyoE  GanDa  Yai 

-  female  speech  of  (1-1) 

(1-n) 

HagGyoE  GaJi 

-  an  older  person  speaking  to 
a  younger  person 
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(l-o) 

HagGyoE  Ga 

-  same  as  (1-n) 

(1-p) 

HagGyoE  GanDanDa 

”  a  female  speaking  to  a  younger  person 

(1-q) 

HagGyoE  GanDaGuYo 

(1-r) 

HagGyoE  GanDaGu 

The  examples  above  are  far  from  exhaustive.  Although  the  examples  are  numerous  for  a 
language,  by  simply  switching  the  verbal  classification  section  of  the  lexicon  file,  it  is  possible  to 
produce  the  right  kinds  of  style.  This  will  require  an  additional  discourse  module  to  the  existing 
system  to  be  able  to  figure  out  in  which  context  the  source  language  is  used. 

4.4  Preposition  vs  Postposition 

One  of  the  striking  differences  between  English  and  Korean  is  that  Korean  uses  postpositions 
instead  of  prepositions.  Because  postpositions  serve  similar  functions  as  prepositions,  it  is  usually 
the  case  that  prepositions  get  translated  into  postpositions  and  vice  versa.  As  much  as  this  trans¬ 
lation  approach  seems  the  only  possible  choice,  there  is  bound  to  be  a  failure  because  there  are 
multiple  meanings  to  a  single  preposition  in  English.  If  the  meaning  of  a  particular  preposition  in 
a  sentence  can  be  extracted  and  represented  perfectly  in  the  semantic  frame,  this  problem  might 
be  eliminated.  This  is,  however,  very  difficult  to  achieve.  Even  if  the  analysis  component  does  a 
perfect  task  of  distinguishing  each  meaning  of  a  particular  English  preposition,  Korean  might  not 
have  postpositions  that  correspond  to  all  the  distinguished  meanings,  hence  it  fails  the  one-to-one 
mapping  method  used  in  the  Korean  language  generation. 

Note  that  this  problem  is  even  more  severe  in  the  transfer  method  approach.  The  ma¬ 
chine  translation  systems  developed  at  the  Korean  Advanced  Institute  of  Science  and  Technology 
(KAIST)  and  at  Seoul  National  University  (SNU),  suffer  from  the  same  problem,  as  evidenced  by 
the  test  evaluations  documented  in  at  MITRE  [12].  Their  systems  replace  default  Korean  postpo¬ 
sitions  with  English  prepositions;  this  approach  often  produces  incorrect  and  extremely  awkward 
translations. 

In  dealing  with  the  issue  of  prepositions  vs  postpositions,  the  interlingua  approach  has  an 
advantage  because  each  of  the  various  meanings  of  a  particular  preposition  of  English  can  be 
mapped  to  a  different  semantic  meaning  representation.  If  a  transfer  approach  is  used,  only  one 
semantic  meaning  can  be  mapped  with  each  preposition,  which  often  results  in  incorrect  and/or 
awkward  translations. 

4.5  Mapping  Approach 

The  one-to-one  mapping  approach  without  sufficient  analysis,  and  therefore  inadequate  se¬ 
mantic  frames,  causes  another  problem.  This  problem  is  best  illustrated  with  an  example. 
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Consider  the  English  phrase  “TRY  TO  OBTAIN.”  “TO  OBTAIN”  corresponds  to  GuHa- 
GiReul  and  “TRY”  corresponds  to  SiDoHaDa.  Combining  these  two  would  be  GuHaGiReul  SiDo- 
HaDa.  This  is  a  root,  however,  and  an  extra  “n”  needs  to  be  added  to  the  second  to  last  syllable  to 
make  GuHaGiReul  SiDoHanDa,  which  is  a  present  tense  verb.  Even  so,  this  stiU  sounds  awkward 
because  the  natural  way  of  saying  “TRY  TO  OBTAIN”  is  GuHaRyeoHanDa  with  GuHaRyeoHaDa 
as  its  root.  Therefore,  in  order  to  produce  the  more  natural  output,  GuHaRyeoHanDa,  the  seman¬ 
tic  frame  would  have  to  be  complete  enough  not  only  to  represent  the  meaning  of  each  word,  but 
the  meaning  of  the  phrase  that  the  word  belongs  to.  In  addition,  the  mapping  in  the  language 
generation  would  have  to  contain  such  cases  as  well. 

Given  that  the  parser  is  robust  enough  to  identify  such  verbal  phrases,  and  that  the  semantic 
frame  can  also  embrace  the  meanings  of  such  phrases,  the  following  approach  can  be  taken  in 
modifying  the  language  generation  to  augment  such  verbal  phrases.  This  approach  is  very  similar 
to  the  approach  suggested  for  hahdhng  negations  and  passive  voice  sentences,  discussed  in  Section 
1.3.1  of  Section  1,  i.e.,  to  treat  “TRY”  as  a  modal,  triggering  a  particular  mechanism  that  specifies 
what  verbal  inflection  to  use  for  each  of  the  eight  verbal  categories.  For  the  example  previously 
cited,  the  corresponding  inflection  would  be  “GyeoHanDa”  with  “GuHa”  as  the  root  of  the  verb. 
Just  as  with  the  cases  for  negations  and  passive  voice,  a  mechanism  that  would  handle  multiple 
inflections  will  be  needed  for  verbs  that  have  more  than  one  mode.  An  example  would  be  future 
“TRY”  verbs. 

Even  with  all  these  modifications,  the  final  output  does  not  sound  quite  natural.  A  typical 
Korean  would  say  the  phrase  in  present  continuing  tense,  “GuHaRyeoNeun  JungIDa,”  which  means 
“IN  THE  MIDDLE  OF  TRYING  TO  OBTAIN.”  Although  “GuHaGiReul  SiDoHanDa”  for  “TRY 
TO  OBTAIN”  is  not  incorrect,  it  sounds  very  textual.  “GuHaRyeoNeun  JungIDa”  would  be  the 
most  natural  translation,  which  is  not  the  case  with  the  current  set  up  of  the  system. 

4.6  Lexical  Incompatibility 

When  translating  a  language  to  another  language  of  the  same  root,  it  is  relatively  easy  to  find 
equivalents.  However,  when  English  is  translated  into  Korean,  an  English  word  can  have  multiple 
translations  in  Korean,  or  it  may  not  have  a  translation  at  aU.  Refer  to  Section  3.3.1  of  Section  3 
for  a  further  discussion  of  this  subject. 
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5.  EVALUATION 


5.1  Evaluation  Procedure 

5.1.1  Data 

The  transcription  of  a  Task  Force  Command  Net  exercise  was  used  to  evaluate  the  performance 
of  the  system.  The  data  contain  1400  transcribed  utterances,  which  have  been  divided  into  two 
training  sets  of  approximately  500  sentences  each  and  two  test  sets  of  approximately  200  sentences 
each.  Initially,  a  grammar  was  developed  to  understand  a  set  of  530  sentences.  In  other  words, 
TINA’s  rules  were  hand- developed,  based  on  observed  patterns  in  the  530  training  sentences.  These 
sentences  include  497  sentences  taken  from  the  military  simulation  exercise  transcription  as  well  as 
33  manually-generated  sentences.  In  the  first  evaluation,  the  one  reported  here,  four  native  Korean 
speakers  evaluated  CCLINC’s  translations  of  the  training  sentences.  (The  exact  evaluation  method 
wiU  be  further  described  in  the  following  section.)  A  later  evaluation  used  an  independendent 
set  of  190  sentences  of  the  military  simulation  exercise  transcription  as  test  data.  In  that  later 
evaluation,  CCLINC  was  able  to  parse  52.1%  (99/190)  of  the  test  data.  This  is  a  particularly  good 
result,  considering  that  TINA  only  parses  57.9%  (288/497)  of  the  training  sentences  taken  from 
the  military  exercise  transcription.  The  conclusion  is  that  part  of  the  coalition  brigade  domain  has 
been  covered  quite  well.  More  detailed  results  of  these  experiments  are  reported  in  Tummala  et  al. 

[4]. 

5.1.2  Method 

The  data  contain  530  sentences  of  which  325  sentences  are  distinctive.  The  redundant  are 
discarded  for  the  purpose  of  evaluation.  CCLINC’S  translations  of  the  325  sentences  are  categorized 
under  two  headings:  unparsed  or  parsed.  The  parsed  sentences  are  evaluated,  based  on  how  closely 
the  meaning  has  been  preserved  (adequacy)  and  how  fluent  the  translation  sounds  (fluency).  This 
evaluation  was  carried  out  by  four  native  Korean  speakers,  who  scored  each  translation  from  5  to 
1,  5  being  the  best  and  1  being  the  worst.  The  four  scores  for  each  translation  were  averaged  and 
rounded  to  an  integer. 

5.1.3  Scores 

The  results  are  shown  in  Table  2,  Figure  6,  and  Figure  7.  Note  that  201  sentences,  which 
contribute  61.85%,  failed  to  be  parsed. 
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TABLE  2 
Evaluation  Scores 


1 

Adequacy 

Scores 

Percentage  of 
All  Data 

Fluency 

Scores 

Percentage  of 
All  Data 

Percentage  of 
Parsed  Data 

E 

0 

0 

0 

0 

0 

0 

1 

0.31 

0.81 

2 

0.62 

1.61 

3 

5 

1.54 

4.03 

13 

4.00 

10.48 

D 

20 

6.15 

16.13 

29 

8.92 

23.39 

98 

30.15 

79.03 

80 

24.61 

64.52 

263982-6 

250  I - 1 - - “I  I  i  I 


0  1 

2  3 

4 

5 

UNPARSED  (WORST) 

TRANSLATION  SCORE 

(BEST) 

Figure  6.  Adequacy  score  histogram  (0  indicates  unparsed). 
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Figure  7.  Fluency  score  histogram  (0  indicates  unparsed). 


5.2  Analysis 

Although  the  majority  of  the  translations  for  the  parsed  sentences  scored  5  for  both  adequacy 
and  fluency,  a  rather  large  number  of  parsed  sentences  resulted  in  unsatisfactory  translations.  The 
causes  for  the  unsatisfactory  translations  can  be  from  insufficient  analysis  of  input  sentences  by 
TINA,  or  inadequacies  of  GENESIS  for  Korean,  or  other  linguistic  phenomena  that  are  not  related 
with  TINA  and  GENESIS, 

Table  3  shows  the  sources  of  errors  and  their  distributions.  The  errors  cause  the  translations 
to  be  either  inadequate  or  influent.  The  distributions  are  the  numbers  of  events  that  each  error 
category  happens.  These  numbers  are  not  directly  from  the  scores  listed  in  Table  2.  They  are 
obtained  by  analyzing  the  translations  with  scores  less  than  &/5  and  counting  the  events  that  the 
errors  occurred.  The  following  sections  illustrate  problems  that  are  caused  by  each  of  the  categories 
in  Table  3. 
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TABLE  3 

Occurrences  of  Each  Error  Source 


Insufficient  analysis  of  TINA 

Inadequacies  of  GENESIS 

-fixable  by  changing  rules 

6 

-require  code  modification 

6 

Other 

7 

5.2.1  Insufficient  Analysis  of  TINA 

MOVE  BATTALION  DELTA  TO  HILLTOP  CHARLIE  -  input 

ChoalLi  GoJiGgaJi  DelTa  DaiDaiReul  UmJigIRa  -  translation 

STIR  battalion  delta  UP  TO  hilltop  Charlie  -  translation  in  English 

There  are  two  problems  with  the  translation:  both  stem  from  insufficient  analysis.  The 
problems  are  in  effect  one  in  a  sense  that  they  all  suffer  from  lexical  ambiguity.  “Move,”  for 
example,  can  mean  “to  go  from  one  point  to  another,”  “to  change  one’s  residence,  or  stir,  etc. 
Also  the  preposition  “to”  assumes  multiple  roles.  TINA  is  certainly  capable  of  distinguishing  the 
different  meanings  of  a  particular  word.  Fixing  this  kind  of  problem  would  be  an  easy  task. 

WE  ARE  OBSERVING  THE  ENEMY  ON  THE  NORTH  AND  THE  WEST  -  input 
URiNeun  Bug  HaGo  SeoESeo  JeogGunEul  GoanChalHanDa  “  translation 
we  OBSERVE  the  enemy  on  the  north  and  the  west  -  translation 

in  English 

MINEFIELD  DISCOVERED  NEAR  SECTOR  ALPHA 
URiNeun  AlPa  GuYeog  GeunCheoE  JiRoiReul  BalGyeonHanDa 
we  DISCOVER  minefield  near  sector  alpha 

WE  ARE  ENGAGED  -  input 

URiNeun  GoChagHanDa  -  translation 

we  ENGAGE  -  translation  in  English 


-  input 

-  translation 

-  translation 
in  English 
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The  three  examples  above  contain  problems  caused  by  ignoring  that  the  sentences  are  either 
present  continuous  tense,  past  tense,  or  passive  voice.  As  with  the  first  example,  exploiting  TINA’s 
capabihty  would  resolve  this  kind  of  problem. 


I  GOT  FOUR  BMPS  OVER  -  input 

NaNeun  SoReyon  Gyeong  JangGabCha  SaReul  SoYoHanDa  ISang  -  translation 
i  POSSESS  four  bmps  over  -  translation 

in  English 


Again,  the  word  “got”  has  a  multiple  meaning,  and  the  analysis  failed  to  pick  the  correct 
meaning. 

SEND  AGAIN  -  input 

BanBogHaRa  -  translation 

REPEAT  -  translation  in  English 


TINA  parsed  this  input  to  mean  “repeat.”  Because  of  this  incorrect  parse,  the  translation  is 
also  incorrect. 

5.2.2  Fixable  by  Changing  Rules  of  GENESIS 

REQUEST  PERMISSION  TO  DEFEND  HILLTOP  ECHO  -  input 
URiNeun  EKoGoJiReul  ChugSeongHaGiReul  HeoGaReul  YoGuHanDa 

-  translation 


Redundant  usage  of  the  postposition  “Reul”  makes  a  translation  that  could  be  fluent  oth¬ 
erwise.  Instead  of  using  postpositions  every  time  there  is  an  object  in  the  messages  file,  use 
postpositions  only  when  they  are  absolutely  necessary. 


OH  WAIT  -  input 
A  GiDaRiRa  -  translation 
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The  problem  with  this  translation  was  pointed  out  by  a  grader.  Authoritative  commands  in 
Korean  can  be  classified  into  two  categories.  One  can  be  said  to  have  either  “EoRa,”  “YeoRa  ending 
whereas  the  other  one  usually  has  an  “ARa,”  “EuRa”  or  “IRa”  ending.  The  former  inflection  is  used 
by  most  people  including  civilians  and  off-duty  military  personnel.  It  is  considered  to  be  standard 
inflection  for  authoritative  command  sentences.  “ARa”  and  “EuRa”  are  almost  never  used  by 
civilians  in  normal  conversations  or  writing.  K  used  at  all,  it  would  be  by  mihtary  personnel. 
One  of  the  graders  commented  that  the  latter  inflections  are  rarely  used  now  in^  Korea,  and  they 
would  sound  awkward  even  to  military  personnel.  The  more  natural  translation  would  be  “A 
GiDaRyeoRa,”  this  can  be  easily  fixed  in  the  lexicon  file. 


AFFIRMATIVE  -  input 
DanJeongJeogIDa  -  translation 

Although  “DanJeongJeogIDa”  is  not  an  incorrect  translation,  “GeuReohDa”  would  be  a 
better  translation  because  it  is  more  widely  used. 

5.2.3  Require  code  modifi.cation  for  GENESIS 

One  of  the  most  difficult  problem  with  the  Korean  language  generation  deals  with  choosing 
the  right  inflection  endings  for  verbs.  The  following  example  illustrates  this  point. 


I  AM  TRYING  TO  GET  A  GRID  NOW  -  input 

NaNeun  JoiPyoReul  JiGeum  GuHaGiReul  SiDoHanDa  -  translation 


The  problem  with  this  translation  occurs  because  GENESIS  tries  to  map  “trying  to”  and 
“get”  with  two  different  words  whereas  the  natural  translation  uses  one  verb  for  “get”  with  an 
inflection  ending  that  incorporates  the  meaning  of  “trying  to.”  Refer  to  Section  4.5  of  Section  4  to 
see  the  discussion  in  depth. 

5.2.4  Other 

In  this  subsection,  the  discussion  focuses  on  the  problems  that  occur  not  because  of  inade¬ 
quacies  of  TINA  or  GENESIS,  but  because  of  the  greatly  different  linguistic  natures  of  English  and 
Korean.  These  problems  propose  the  greatest  difficulty  in  translating  Korean  from  English. 

LEAD  ELEMENTS  OF  MY  UNIT  NOW  PASSING  PHASE  LINE  ALPHA  -  input 
Nai  BuDaiEui  SeonDuBuDaiNeun  JiGeum  AlPa  TongGyeSeonEul  TongGoaHanDa 
-  translation 
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This  example  shows  the  cultural  dilFerence  reflected  in  the  languages.  What  can  be  considered 
to  belong  to  a  person  in  English  is  often  thought  to  belong  to  a  group  in  Korean.  Although  the 
translation  is  both  adequate  and  fluent,  the  more  natural  translation  would  use  “URi,”  meaning 
“our,”  instead  of  “Nai,”  meaning  “my.” 

FIRST  BATTALION  COMMANDER  REPORT  YOUR  LOCATION  -  input 
CheossJjai  DaiDaiEui  BuDaiJang  Ne  JangSo  BoGoHaRa  -  translation 


It  is  natural  to  use  a  cardinal  number  in  expressions  such  as  “first  battalion.”  However,  for 
such  an  expression,  Koreans  use  “three  battalion”  instead. 


WE  ARE  NOW  GOING  TO  GET  INTO  THEIR  MAIN  DEFENSIVE  BELT  -  input 
UriNeun  JiGeum  GeuDeulEui  JuYoHan  BangEoYoDaiReul  ChimTuHaGiReul  GanDa 
-  translation 


Contracted  forms  are  used  very  frequently  in  Korean.  Although  using  “JuYoHan  BangEoY- 
oDai”  is  both  adequate  and  fluent,  using  “JuBangEoYoDai”  for  “main  defensive  belt”  sounds  even 
more  fluent. 

ONE  BMP  AND  ONE  SAGGER  TEAM  OVER  -  input 

SoRyeon  Gyeong  JangGabCha  II  HaGo  SaGaTim  II  ISang  -  translation 


As  discussed  in  Section  3.3.1  of  Section  3,  there  axe  two  ways  of  reading  arabic  numbers. 
When  numbers  are  used  to  count  items,  as  in  this  case,  pure  Korean  is  used.  “HaNa”  should  be 
used  instead  of  “II”  for  “one.” 

5.3  Conclusion 

The  scores  on  the  translations  of  the  test  sentences  indicate  that  nearly  80%  of  the  parsed 
sentences  have  reasonable  adequacy  and  nearly  65%  of  the  parsed  sentences  have  acceptable  fluency. 
Most  of  the  problems  that  contribute  to  the  rest  of  the  parse'd  sentences  arise  from  either  under- 
utilizing  the  capabilities  of  TINA  and  GENESIS  or  their  infancy  stage.  With  improved  rules  and 
augmented  codes  for  TINA  and  GENESIS,  the  future  evaluation  is  believed  to  result  in  better  scores. 
The  problems  discussed  in  Section  5.2.4,  however,  propose  serious  difficulty  in  the  translation  and 
requires  further  research. 
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6.  DISCUSSION  AND  FUTURE  PLANS 


This  report  describes  the  degree  to  which  GENESIS  is  able  to  handle  Korean  language  gen¬ 
eration  in  an  interlingua  system.  The  system  has  been  trained  with  and  tested  on  a  transcription 
of  a  Task  Force  Command  Net  exercise.  The  two  measures  of  evaluation,  adequacy  and  fluency, 
indicate  that  nearly  80%  of  the  parsed  sentences  are  reasonably  good  translations  in  the  sense  that 
they  carry  the  correct  meaning  of  the  original  sentences,  and  that  approximately  65%  of  the  parsed 
sentences  sound  natural  to  native  Korean  speakers. 

The  current  system  is,  however,  an  evolving  system.  The  internal  engines  of  TINA  and 
GENESIS  are  constantly  improved  to  handle  more  complex  and  new  sentences.  The  grammar  rules 
for  TINA  are  being  developed  further  to  accommodate  the  linguistic  phenomena  that  cannot  be 
handled  by  current  rules,  such  as  negations,  passive  voice,  and  articles.  Along  with  these  improved 
rules,  a  more  exhaustive  semantic  frame  is  being  developed.  This  more  exhaustive  semantic  frame 
would  resolve  the  lexical  ambiguities  of  the  source  language. 

Given  the  improved  set  up  for  TINA  and  the  new  semantic  frame,  better  parsing  can  be 
expected,  making  correct  language  generation  a  more  feasible  task.  Certainly,  given  the  right 
parses,  the  adequacy  measure  can  be  expected  to  improve  drastically,  as  even  a  string  of  correct 
Korean  equivalents  to  the  English  input  would  allow  one  to  extract  the  intended  meanings  of  the 
input  sentences.  Improving  the  other  measure,  fluency,  is  believed  to  be  a  more  diflScult  task. 

Even  when  parsing  has  been  done  correctly,  generating  Korean  translation  by  putting  the 
right  nouns,  adequate  postpositions,  and  verbs  with  appropriate  inflections  might  produce  very 
awkward  output.  The  awkwardness  can  happen  due  to  several  reasons.  One  obvious  reason  is 
that  English  idiomatic  expressions  may  produce  totally  unrelated  strings  of  Korean  words  when 
translated  in  the  way  described.  Another  one  is  that  a  natural  Korean  expression  might  employ  a 
set  of  words  for  which  an  equivalent  English  expression  does  not  exist.  For  example,  the  natural 
Korean  translation  for  “can  you  buy  it  for  me?”  is  “can  you  buy  and  give  it  to  me?”  when 
translated  back  to  English.  Because  of  dissimilarities  such  as  these  between  the  two  languages, 
achieving  fluent  Korean  translation  is  believed  to  be  a  hard  task. 

The  evaluation  process  will  also  need  to  be  augmented.  One  of  the  tendencies  that  has  been 
noticed  when  evaluating  some  preliminary  translations  is  that  the  evaluators  become  used  to  the 
translation  patterns  so  that  they  unconsciously  start  to  believe  that  the  translations  were  more 
correct  as  the  evaluation  progressed.  To  prevent  this  from  occurring,  evaluators  would  need  to  be 
divided  into  two  groups:  one  group  would  be  provided  translations  on  paper,  and  the  other  group 
would  listen  to  a  Korean  speech  synthesizer  for  evaluation. 

A  Korean  speech  synthesizer  named  “Says,”  produced  by  Digicom  (in  Korea)  was  acquired 
for  this  purpose,  but  has  not  been  completely  installed  due  to  a  software  component  that  is  lacking 
at  this  moment.  When  it  is  incorporated  into  the  system,  the  evaluation  procedure  outlined  above 
wiU  be  possible. 
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APPENDIX  A 
LEXICON  FOR  GENESIS 


47 


TABLE  A-1 

Lexicon  File  for  GENESIS 

A  A  "HaNaEui" 
close  A  "GaGgaUn” 
defensive  A  "BangEo" 

front  A  '‘ApEui’* 

left  A  ’'OinJjog'* 

right  A  "OReunJjog" 

heavy  A  "DaiGyuMoEui" 

1  A  •'ll”  CARDINAL  "CheosJjai” 

quick  A  "BbaReun” 

rough  A  "GeoChilEun” 

main  A  "JuYoHan” 

this.is  A  "YeoGiNeun” 

unknown  A  "AlRyeoJiJi  AnhAxDa” 

at_time  C  ”  " 

and  C  ”HaGo” 
or  C  ”INa” 

are  X  ”  *' 

commandl  CL  ”CL”  MODE  "impl” 
coinmand2  CL  ”CL"  MODE  ”imp2” 

is  X  ”  ” 

to  X  ”  ”  MODE  "root” 
when  CL  ”  ”  MODE  "case” 
will  X  "  "  MODE  "future" 

def  D  "  " 

first  D  "CheosJjai" 
indef  D  "  " 
my  D  "Nai" 
no^det  D  "  " 
some  D  "JoGeum” 
your  D  "Ne" 

his  D  "GeuEui" 

our  D  "URiEui" 

their  D  "GeuDeulEui" 

0  D  "Yeong" 

2  D  "I"  CARDINAL  "DulJjai" 

3  D  "Sam"  CARDINAL  "SesJjai" 

4  D  "Sa"  CARDINAL  "NesJjai" 

5  D  "0"  CARDINAL  "DaSeosJjai" 
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6  D  “Yug”  CARDINAL  "YeoSeos Jjai" 

7  D  ••Chil”  CARDINAL  ‘•IlGobJjai” 

8  D  *'Pal”  CARDINAL  "YeoDeolbJjai” 

9  D  *‘Gu”  CARDINAL  "AHobJjai" 


N  N  *•  " 

a-4  N  •’a-4” 

a-e  N  “a-e” 

a-?  N  "a-T” 

a~10  N  *‘a-10’' 

air  N  "HangGong" 

air_alert  N  *'GongSeubGyeongBo*‘ 

air_combat_f ighter  N  "JeonTuGi” 
air_strike  N  "DaiGongGongGyeog" 
air _ support  N  "HangGongJiUeon** 
airplane  N  *'BiHaingGi" 

alligator  N  "AgEo" 

aloe  N  ’’HangGong  ByeongChamSeon" 

alpha  N  »AlPa" 

alpha_bravo  N  "AlPa  BeuRaBo” 

ainmo_ status  N  "TanYag  SangTai*' 

artillery  N  "PoByeong** 

attack  N  *'GongGyeog" 

attention  N  "JuEui" 

battalion  N  ’’DaiDai" 

bear  N  "Gom" 

belt  N  "YoDai" 

bmp  N  "SoRyeon  Gyeong  JangGabCha” 

bmp^team  N  "SoRyeon  Gyeong  JangGabCha  Pyeon" 

bravo  N  "BeuRaBo" 

bridge  N  "GyoRyang" 
bridge_report  N  "GyoRyang  BoGo" 

Charlie  N  "ChoalRi" 

checkpoint  N  "GeomMunSo" 
cheetah  N  "ChiTa" 

commander  N  "BuDaiJang" 
company  N  "JungDai" 

contact  N  "JeobChog" 

coordinated.attack  N  "HyeobDongGongGyeog" 
corsair  N  "HaiJeogSeon" 

crocodile  N  "AgEo" 
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def  ensive_belt 

N 

“BangEoYoDai” 

delta 

N 

•■Delia*' 

delta_ Charlie 

N 

"Della  ChoalRi'* 

digger  N  "GaingBu' 

1 

dismount 

N 

"NagChaGun" 

dragon 

N 

"Yong" 

eagle 

N 

"DogSuRi" 

east 

N 

"Dong" 

echo 

N 

"EKo" 

element 

N 

"YoSo" 

enemy 

N 

"JeogGun" 

eta 

N 

"YeSangDoChagSiGan" 

f orces^motorized 

N  "GiDongByeongRyeog' 

foxtrot 

N 

"PogSeuTeuRosTeu" 

fran 

N 

"PeuRain" 

ghostrider 

N 

"YuRyeongGiSa" 

grid 

N 

"JoaPyo" 

gunners 

N 

"SaSu" 

hawk 

N 

"Mai" 

hilltop 

N 

"GoJi" 

hotel 

N 

"HoTel*' 

id  N  " 

infantry  N  "BoByeong" 
intruder  N  *’CliimIbJa“ 

Juliet  N  "JyulRiEs" 

laying  N  "SeolChi*' 
lead_element  N  “SeonDuBuDai" 
leopard  N  "PyoBeom” 

lieutenant  N  "JungUi" 
line  N  *'Seon*' 
lion  N  "SaJa” 
location  N  "JangSo'* 
mine  N  "JiRoi” 

minef ield_laying_report  K  "JiRoiBat  SeolGlii  BoGo" 
motorized.forces  N  ’’GiDongByeongRyeog" 
nbc_alert  N  ’’HoaSaingBang  GyeongGo" 

north  N  »Bug“ 

november  N  "NoBemBeo" 

object  N  "  '* 

objective  N  "MogPyo" 
op  N  "OPi" 

operation  N  "JagJeon*' 
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overwatch  N  *'GaiaSi*' 

panther  N  "PyoBeom*' 

passability  N  "TongGoaSeong” 
permission  N  "HeoGa" 
phase_line  N  "TongJeSeon'* 
platoon  N  "SoDai" 

positive  N  "GeuReohDa*' 

ranger  N  "YuGyeogByeong" 
rear  N  "Dui" 

resistance  K  "JeoHang” 
rhino  N  "KoBbulSo" 
ridge  N  "SanMaRn*’ 

road  N  "Gil” 
saber  N  "GiByeongDai" 
sagger  N  "SaGa" 

sagger_teaia  N  "SaGaTim" 
scorpion  N  "JeonGal" 
sector  N  "GiiYeog" 
shark  W  "SangEo" 

sitrep  N  "SangHoangBoGo" 

snake  N  "Baim" 

soldier  N  "Gnnin" 

south  N  "Nam" 

t-72  N  "t-72" 

tank  N  "TaingKeu" 

team  N  "Pyeon" 

terrain  N  "JiYeog" 

that  N  "JeoGeos" 

this  N  "IGeos" 

tiger  N  "HoRangl" 

toe  N  "JeonSul  JagJeonBonBu 

troops  N  "GiGab  JungDai" 

unit  N  "BioDai" 

vulture  N  "DogSuRi" 

west  N  "Seo" 

wolf  N  "NeugDai" 

P  N  "P"  G  "m" 

he  PN  "Geu"  NUM  "third" 

him  PN  "Geu"  NUM  "third" 

her  PN  "GeuNyeo"  NUM  "third 

PN  "Na"  NUM  "first" 
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it.obj  PN  ’’GeuGeos"  NUM  "third" 

it_subj  PN  "GeuGeos"  NUM  "third" 

people  PN  "SaRamDeul"  NUM  "third" 

pro  PN  "Na"  SECOND  "Neo"  FPL  "URi"  SECOND  "NeoHeui" 

we  PN  "URi"  NUM  "fpl" 

she  PN  "GeuNyeo"  NUM  "third" 

them  PN  "GeuDeul"  NUM  "pi" 

they  PN  "GeuDeul"  NUM  "pi" 

you.obj  PN  "Neo"  NUM  "second" 

you.subj  PN  "Neo"  NUM  "second" 


0  "DanJeongJeogIDa" 
"JiGenm  ISiGan" 

0  "JungJiHanDa" 

0  "AlAxDa" 

0  "GyeogRyeolHaGe" 

0  "ANiDa" 

0  "ANiDa" 


affirmative 

at_this_time  0 

break 

copy 

heavily 

negative 

no 

now  0  "JiGenm" 

oh 

ok 

okay 

over 

pretty 

quickly 

roger 

roger_that 

yea 

yes 


0  "A" 

0  "JohA" 

0  "JohA" 

0  "ISang" 

0  "SangDangHi" 
0  "BbaReuGeo" 

0  "AlAxDa" 

0  "AlAxDa" 

0  "GeuReohDa" 

0  "GeuReohDa" 


V  V  "V"  ROOT  "HaGiReul"  ING  "HaGo  IxDa"  IMP!  "HaJla" 

IMP2  "EaJa"  FUTURE  "Hal  GeosIDa"  CASE  "Hal  Ddai"  FIRST  "HanDa" 
SECOND  "HanDa"  THIRD  "HanDa"  PL  "HanDa"  FPL  "HanDa" 


V2  V2  "V2"  ROOT  "GiReul"  ING  "Go  IxDa"  IMPl  "EuRa" 

IMP2  "Ja"  FUTURE  "Eul  GeosIDa"  CASE  "Eul  Ddai"  FIRST  "NeimDa" 
SECOND  "NennDa"  THIRD  "NeunDa"  PL  "NetinDa"  FPL  "NeunDa" 

V3  V3  "V3"  ROOT  "GiReul"  ING  "Go  IxDa"  IMPl  "ARa" 

IMP2  "Ja"  FUTURE  "Eul  GeosIDa"  CASE  "Eul  Ddai"  FIRST  "NeunDa" 
SECOND  "NeunDa"  THIRD  "NeunDa"  PL  "NeunDa"  FPL  "NeunDa" 
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V4  V4  **V4*'  ROOT  "  SGiReul”  ING  "  SGo  IxDa”  IMPl 

"EuRa”  IMP2  "  SJa”  FUTURE  **Eul  GeosIDa"  CASE  "Eul  Ddai'‘  FIRST 
"  SNeunDa"  SECOND  "  SNeunDa'*  THIRD  "  SNeunDa*’  PL  "  SNeunDa” 

FPL  •'  SNeunDa’* 

V5  V5  "V5"  ROOT  ’*  DGiReul"  ING  ”  DGo  IxDa”  IMPl 

"  REoRa"  IMP2  ”  DJa”  FUTURE  *'  REul  GeosIDa”  CASE  "  REul  Ddai” 
FIRST  ”  DNeunDa”  SECOND  ”  DNeunDa”  THIRD  ”  DNeunDa”  PL 
”  DNeunDa"  FPL  ”  DNeunDa" 

V6  V6  "V6”  ROOT  ”  BGiReul”  ING  ”  BGo  IxDa"  IMPl 

"UeoRa"  IMP2  "  BJa"  FUTURE  "Eul  GeosIDa’*  CASE  "Eul  Ddai" 

FIRST  "  BNeunDa”  SECOND  "  BNeunDa"  THIRD  "  BNeunDa"  PL 
"  BNeunDa"  FPL  "  BNeunDa" 

VIO  VIO  '*V10"  ROOT  "GiReul"  ING  "Go  IxDa"  IMPl  "Ra" 

IMP2  "Ja"  FUTURE  "  R  GeosIDa"  CASE  "  R  Ddai"  FIRST  "  NDa" 

SECOND  "  NDa"  THIRD  "  NDa"  PL  "  NDa"  FPL  "  NDa" 

Vll  Vll  "Vll"  ROOT  "GiReul"  ING  "Go  IxDa"  IMPl  "Ra" 

IMP2  "Ja"  FUTURE  "  R  GeosIDa"  CASE  "  R  Ddai"  FIRST  "  NDa" 

SECOND  "  NDa"  THIRD  "  NDa"  PL  "  NDa"  FPL  "  NDa" 

approach  Vll  "DaGaGa" 

be  VIO  "  " 

begin  V  "SiJag" 

call  Vll  "BuReu" 

cross  Vll  **GeonNeo" 

destroy  V  "PaGoi" 

discover  V  *'BalGyeon" 

encounter  VIO  "ManNa" 

engage  V  *‘GoChag" 

engaged_witli  V  "GoChag** 

fortify  V  "ChugSeong" 

go  Vll  "Ga" 

leave  Vll  "DdeoNa" 

monitor  V  "GamCheong" 

move  VIO  "UmJigI" 

observe  V  "GoanChal" 

obtain  V  "Gu" 

pass  V  "TongGoa" 

pay  V  "JiBul" 
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pay^attention 

V  ’'JuEui” 

penetrate 

V  "ChimTu" 

possess 

V  "SoYu" 

receive 

V3  "Bad” 

repeat 

V  ”BanBog“ 

report  V  "BoGo” 

v_request  V  "YoGu” 

sight  V  "MogGyeog" 

signal  V  ’*SinHo 

II 

tahe_action  V3 

••Mat" 

take-over 

V3  "InGyeBad" 

think 

V  "SaingGag" 

try  V  "SiDo" 

use 

V  "SaYong’* 

wait 

Vll  "GiDaRi" 

want 

V  ‘'Ueon*' 

wave 

VIO  "HeunDeul' 

laugh 

V2  ••Us" 

bury 

V2  "Mud" 

bend 

V2  "Gub" 

draw 

V4  "Geu" 

ask 

V6  "Mu" 

roast 

V6  "Gu" 

APPENDIX  B 
MESSAGES  FOR  GENESIS 
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TABLE  B-1 

Messages  File  for  GENESIS 

commandl  :OPENIHG  :ID1  : TOPIC  : PREDICATE  :ID2  : CLOSING 

coininsL]id2  :  OPENING  :ID1  :T0PIC  :  PREDICATE  :ID2  :  CLOSING 

s'tcL'tGiQOXi't  :  OPENING  ;ID1  (r TOPIC  pro)  *Etiii  iADV.DEGREE 

:ADV_MAIN  :ADV_S0LE  : PREDICATE  :ID2  : CLOSING 
call.up  ; OPENING  :ID1  : TOPIC  : PREDICATE  ; CLOSING 

reply  : OPENING  : TOPIC  :CVC2_MSG  : PREDICATE  : CLOSING 

topic  : QUANTIFIER  : COMPLEMENT  :NDUN_PHRASE 

np-and  ;NOUN_PHRASE  : TOPIC 

and  :NOUN_PHRASE  : TOPIC 

np-or  :NOUN_PHRASE  : TOPIC 

conjunction  : TOPIC 1  : CONJUNCTION  :T0PIC2 

near  ; TOPIC  GeunCLeoE 

np-near  ; TOPIC  GeunCheoE  :NOUN_PHRASE 


np-of  : TOPIC  Eui  :NODN_PHRASE 
of  ; TOPIC 


np- ad j  _ int  en s it y 
adj_intensity 

np-dir e  ct ional 
directional 

at  : TOPIC  ESeo 
np-at  : TOPIC  ESeo 


: TOPIC  :NOUN_PHRASE 
:  TOPIC 

: TOPIC  :NOUN_PHRASE 
:  TOPIC 


:N0UN_ PHRASE 


np-degree  ; TOPIC  ;NOUN_PHRASE 
degree  -.TOPIC 


np-up_to  : TOPIC  GgaJi  :NOUN_PHRASE 
up_to  : TOPIC 

np-from  : TOPIC  ESeo  :HOUN_PHRASE  : PREDICATE 

from  : TOPIC  : PREDICATE 

np-to  : TOPIC  : PREDICATE  :NODN_PHRASE 
to  : TOPIC  : PREDICATE 

this_is  : PREDICATE  : TOPIC  :ID2 
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begin  : OB JECT_PRONOUN  :ADV_WHEN  ;ADV_DEGREE 

:ADV_MAIN  :ADV_S0LE  -.PREDICATE  : TOPIC 

take_action  : OB JECT_PRONOUN  :TOPIC  +Eul  :ADV_DEGREE 

:ADV_MAIN  :ADV_SOLE  : PREDICATE 

take_over  :OBJECT_PROHOOT  : TOPIC  :ADV_DEGREE 

:ADV_MAIH  :ADV_SOLE  : PREDICATE 

np-take_over  :HOUN_PHRASE  : DBJECT.PRONOUN  : TOPIC 

;ADV_DEGREE  ;ADV_MAIN  :ADV_SOLE  : PREDICATE 

pass  ;OBJECT_PRONOUK  ;ADV_WHEH  : TOPIC  *Eiil 
:ADV_DEGREE  :ADV_MAI1I  :ADV_SOLE  : PREDICATE 
np-pass  :NOUN_PHRASE  :OBJECT_PROHOUN  :AUX  : TOPIC  *Eul 
:ADV_DEGREE  :ADV_MAIN  :ADV_SOLE  : PREDICATE 

pay.attention  :OBJECT_PRONOOT  :TOPIC  :ADV_DEGREE 

:ADV_MAIN  :ADV_SOLE  : PREDICATE 

np-pay_attention  : II0U1I_PHRASE  : OB JECT.PRONOUN  :TOPIC 

:ADV_DEGREE  ;ADV_MAIN  :ADV_SOLE  : PREDICATE 

phone  :OBJECT_PRONOUN  :ADV_WHEN  : TOPIC  *Eul 
:ADV_DEGREE  :ADV_MAIR  ;ADV_SOLE  : PREDICATE 
np-phone  ;NOUH_PHRASE  :OBJECT_PROHOUN  :AUX  : TOPIC  *Eul 
:ADV_DEGREE  :ADV_MAIN  :ADV_SOLE  ; PREDICATE 

report  :OBJECT_PROIIOUN  :ADV_CLAUSE  : TOPIC 
:ADV_DEGREE  :ADV_MAIN  :ADV_SOLE  : PREDICATE 

go  ;OBJECT_PROirOUN  :  COMPLEMENT  :ADV_WHE1I 

:ADV_DEGREE  :ADV_MAI1I  :ADV_SOLE  : PREDICATE 

cross  :OBJECT_PRONOUN  : TOPIC  *Eul  :ADV_DEGREE 

:ADV_MAIN  :ADV_S0LE  : PREDICATE 

np-cross  :NOUN_PHRASE  : OBJECT_PRONODN  -.TOPIC  *Eul 

:ADV_DEGREE  :ADV_MAI1I  :ADV_SOLE  : PREDICATE 

when  :OBJECT_PRONOUH  : TOPIC  :ADV_DEGREE 
;ADV_MAIN  :ADV_SOLE  : PREDICATE 

sight  :OBJECT_PRONOUN  : TOPIC  ^Eul  :ADV_DEGREE 

:ADV_MAIN  :ADV_S0LE  : PREDICATE 
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observe  :OBJECT_PROirOUH  :TOPIC  *Eul  :ADV_DEGREE 

:ADV_MAIN  ;ADV_S0LE  : PREDICATE 


try  :OBJECT_NOUN  :ADV_WHEN  : COMPLEMENT 

:ADV_DEGREE  :ADV_MAIN  :ADV_SOLE  ; PREDICATE 

obtain  :0BJECT_PR0N0UN  : TOPIC  *Eul  :ADV_DEGREE 
:ADV_MAIN  :ADV_SOLE  : PREDICATE 

np-obtain  :NOUN_PHRASE  :OBJECT_PROHOUN  : TOPIC  *Eul 
:ADV_DEGREE  :ADV_MAIN  :ADV_S0LE  : PREDICATE 

request  :  OB JECT_PRONOUN  : TOPIC  *Eul  :ADV_DEGREE 
;ADV_MAIN  :ADV_S0LE  : PREDICATE 

np-request  :N0DN_PHRASE  :0BJECT_PR0N0UB  : TOPIC  *Eul 
:ADV_DEGREE  :ADV_MAIN  :ADV_SOLE  : PREDICATE 

move  ; 0BJECT_PR0N0DN  :T0PIC  *Eul  :ADV_DEGREE 
:ADV_MAIN  :ADV_SOLE  ; PREDICATE 

np-move  :NODN_PHRASE  :0BJECT_PR0N0UN  : TOPIC  *Eul 
:ADV_DEGREE  :ADV_MAIN  :ADV_SOLE  ; PREDICATE 

call  : 0BJECT_PR0N0DN  : TOPIC  *Eul  :ADV_DEGREE 
:ADV_MAIN  :ADV_S0LE  : PREDICATE 

np-call  :NODN_PERASE  : OB JECT.PROHOUN  : TOPIC  *Eul 
:ADV_DEGREE  :ADV_MAIN  :ADV_SOLE  ; PREDICATE 

leave  :0BJECT_PR0N0UN  :ADV_WHEN  : TOPIC 
:ADV_DEGEE  :ADV_MAIN  :ADV_S0LE  : PREDICATE 

encounter  :0BJECT_PR0N0UN  : TOPIC  *Eul  :ADV_DEGREE 
:ADV_MAIN  :ADV_S0LE  : PREDICATE 

np-encounter  :NOUN_PHRASE  : OBJECT_PRONOUN  : TOPIC  *Eul 
;ADV_DEGREE  :ADV_MAIN  :ADV_SOLE  : PREDICATE 

discover  :  OB JECT_PRONOUN  : TOPIC  *Eul  :ADV_DEGREE 
:ADV_MAIK  :ADV_S0LE  : PREDICATE 

np-destroy  : OB JECT_PR0N0UN  -.TOPIC  *Eul  ;ADV_DEGREE 

:ADV_MAIN  :ADV_S0LE  ; PREDICATE 

destroy  :NOUN_PHRASE  : OBJECT.PRONOUN  : TOPIC  *Eul 

:ADV_DEGREE  ;ADV_MAIN  ;ADV_S0LE  : PREDICATE 
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fortify  :0BJECT_PR0N0U1I  :AUX  : TOPIC  *Eul  :ADV_DEGREE 
:ADV_MAIH  :ADV_SOLE  : PREDICATE 

np-fortify  :NODN_PIRASE  : OB JECT.PROHOUN  :AUX  : TOPIC  *Eiil 
:ADV_DEGREE  :ADV_MAIN  :ADV_S0LE  : PREDICATE 

approach  : OBJECT.PRONOUN  : TOPIC  ♦Eul  :ADV_DEGREE 
:ADV_MAIN  :ADV_SOLE  : PREDICATE 

engage  :OBJECT_PRONOUH  : TOPIC  :ADV_DEGREE 
:ADV_MAIN  :ADV_S0LE  : PREDICATE 
engaged.with  : OB JECT_PR01I0UN  : TOPIC  :ADV_DEGREE 
:ADV_MAIN  :ADV_SOLE  : PREDICATE 

penetrate  : OB JECT.PROMOUH  ; TOPIC  *Enl  :ADV_DEGREE 
:ADV_MAIH  :ADV_SOLE  : PREDICATE 

np-penetrate  :HOUH_PHRASE  : OB JECT.PROHOUN  :AUX 

;ADV_DEGREE  :ADV_MAIN  :ADV_SOLE  : PREDICATE 
: TOPIC  *Eul 


possess  :OBJECT_PRONOUH  :TOPIC  *Eul  :ADV_DEGREE 

:ADV_MAIH  :ADV_SOLE  : PREDICATE 

np-possess  ;HOUN_PHRASE  : OB JECT.PRONOUN  : TOPIC  *Eul 

:ADV_DEGREE  :ADV_MAIN  :ADV_SOLE  : PREDICATE 


receive  :OBJECT_PROHOU]I  : TOPIC  :ADV_DEGREE 

:ADV_MAIK  :ADV_S0LE  : PREDICATE 

np-receive  ; HOUH.PHRASE  : OB JECT.PROKOUN  : TOPIC 

;ADV_DEGREE  :ADV_HAIN  :ADV_SOLE  : PREDICATE 


repeat  :0BJECT_PR01I0UN  : TOPIC  :ADV_DEGREE 

:ADV_MAI1I  :ADV_SOLE  :  PREDICATE 

np-repeat  :HOUH_PHRASE  : OBJECT.PRONODN  : TOPIC 

:ADV_DEGREE  ;ADV_MAIH  :ADV_SOLE  : PREDICATE  . 

v_reqTiest  :OBJECT_PRDNOUlI  :  TOPIC  *Eul  :ADV_DEGREE 

:ADV_MAIN  :ADV_S0LE  : PREDICATE 

np-v_request  : HOOT_PHRASE  : OB JECT.PROHOUN  : TOPIC  *Eul 

:ADV_DEGREE  :ADV_MAI1I  :ADV_SOLE  : PREDICATE 
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use  rOBJECT.PROUOUN  : TOPIC  *Eul  ;ADV_DEGREE 

:ADV_MAIN  :ADV_SOLE  : PREDICATE 

np-use  :»OUN_PHRASE  : OB JECT.PRONOUN  : TOPIC  *Eul 

:ADV_DEGREE  ;ADV_MAIN  :ADV_SOLE  : PREDICATE 

wait  :OBJECT_PRONOUN  : TOPIC  :ADV_DEGREE 

:ADV_MAIN  :ADV_S0LE  : PREDICATE 

np-wait  :NOUN_PHRASE  : OB JECT_PRONOUN  : TOPIC 

:ADV_DEGREE  :ADV_MAIN  :ADV_S0LE  : PREDICATE 

want  :OBJECT_PROHOUN  :ID1  =t=Eul  : TOPIC  :ADV_DEGREE 

:ADV_MAIH  :ADV_S0LE  : PREDICATE 

np-want  :HOUN_PHRASE  : OB JECT_PRONOUN  :ID1  *Eul 

; TOPIC  ;ADV_DEGREE  :ADV_MAIN  ;ADV_S0LE 
: PREDICATE 

wave  ;OBJECT_PRDHOUK  : TOPIC  :ADV_DEGREE 

:ADV_MAIH  :ADV_S0LE  : PREDICATE 

np-wave  : HOOT.PHRASE  : OBJECT.PRONOUN  : TOPIC 

:ADV_DEGREE  :ADV_HAIN  :ADV_SOLE  : PREDICATE 

t  errain_type  : TOP IC 

np-terrain_type  :  TOPIC  :  H0U1I_PHRASE 

support _type  ; TOPIC 
np-s'apport_type  :NOUN_PHRASE  : TOPIC 


kind 

np-kind 


: TOPIC 

: TOPIC  :NOUN_PHRASE 


unknown  : TOPIC  :ADV_WHElf  :ADV_S0LE  : PREDICATE 


np-at_time  : TOPIC  : PREDICATE  :NODN_PHRASE 
np-mil_time  : TOPIC  SiE  :NOUH_PHRASE 

; ;  strange  ordering  is  needed  since  the  order  of  predicates  at  the 

; ;  same  level  is  determined  by  their  relative  order  in  this  file 

codel  :NOUN_PHRASE  : TOPIC 

np-codel  :NOUR_PHRASE  :T0PIC 

code4  :NOUH_PHRASE  : TOPIC 

np-code4  :HOUN_PHRASE  : TOPIC 

digit_codel  : TOPIC 
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np-digit_codel  rlTOUN.PHRASE  :TOPIC 
digit_code2  : TOPIC 
iip-digit_code2  :NOTJN_PERASE  :  TOPIC 
code2  :NOU]J_PHRASE  :  TOPIC 
np-code2  :HOUN_PHRASE  :TOPIC 
codes  :HOUN_PHRASE  : TOPIC 
np-code3  :HOUN_PHRASE  ; TOPIC 
codes  :NOUH_PHRASE  : TOPIC 
np-codeS  :NOUN_PHRASE  :T0PIC 
codes  :HOUN_PHRASE  : TOPIC 
np-code6  :NOUN_PHRASE  : TOPIC 
code?  :NOUN_PHRASE  : TOPIC 
np-code?  :NOUN_PHRASE  : TOPIC 
codes  :NOUH_PHRASE  : TOPIC 
np-codeS  :NOUN_PHRASE  :TOPIC 
codes  :NOUN_PHRASE  ; TOPIC 
np-codeS  : HOUN.PHRASE  : TOPIC 
codelO  :NOOT_PHRASE  : TOPIC 
np-codelO  : HOOT.PHRASE  :T0PIC 
codell  :N0U1I_PHRASE  :TOPIC 
np-codell  :KOUH_PHRASE  : TOPIC 


digit_code3  : TOPIC 
np-digit_code3  :NOUir_PHRASE  : TOPIC 
digit_code4  : TOPIC 
iip-digit_code4  :NOUH_PHRASE  : TOPIC 
digit_code5  : TOPIC 
np-digit.codeS  : HOUE.PHRASE  : TOPIC 
digit_code6  : TOPIC 
iip-digit_code6  :NOUH_PHRASE  : TOPIC 
digit_code7  : TOPIC 
iip-digit_code7  :HOUN_PHRASE  : TOPIC 
digit_code8  : TOPIC 
np-digit_code8  :N001I_PHRASE  : TOPIC 

distance  : TOPIC 

np-distance  : TOPIC  :HOUN_PHRASE 

np-cardinal  : TOPIC  :NOUN_PHRASE 

imit_muiiber  Je  :  TOPIC  :HAME 
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np-uiiit_number  Je  :  TOPIC  :NAME  :NOUN„PHRASE 
numeric  : TOPIC 

np-numeric  : NOOT_PHRASE  : TOPIC 

np-nonprec_initicLls  :  TOPIC  :  NOOT_PHRASE 
np“prec_ initials  : TOPIC  :NOUN_PHRASE 
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APPENDIX  C 

REWRITE-RULES  FOR  GENESIS 


TABLE  C-1 

Data  Files  Used  for  Rewrite.c 
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TABLE  C-2 

Program  Automatically  Generating  Korean-Rewrite-Rules.Text 


#iiLclude  <stdio.h> 

#include  <stdlib.li> 

#iiiclude  <string.li> 

#define  MAXLENGTH  10  /*  maximum  length  of  character  strings  */ 

#define  MAXDATA  100  /*  maximum  number  of  vowels  or  consonants  */ 


main ( ) 

FILE 

*fpl; 

/♦  pointer  to  first-consonants  file  */ 

FILE 

*lp2; 

/♦  pointer  to  all-vowels  file  ♦/ 

FILE 

*fp3; 

/♦  pointer  to  final- consonants  file  ♦/ 

FILE 

*fopen() ; 

FILE 

♦result ; 

/♦  pointer  to  chart  file  containing  ♦/ 

/♦  the  relevant  romanized  combinations  of  ♦/ 
/♦  Korean  syllables  ♦/ 

FILE 

♦messy; 

/+  pointer  to  rough-romanized- chart  ♦/ 

/*  containing  the  possible  Korecin  ♦/ 

/♦  syllables  and  the  impossible  Korean  ♦/ 

/*  syllables  represented  by  quoted  blanhs  ♦/ 

FILE 

♦clean; 

/♦  pointer  to  romanized- chart  containing  ♦/ 
/♦  only  the  possible  Korean  syllables  */ 

char  *fir St con [MAXDATA] ;  /*  first  consonants  */ 

char  *vowel  [MAXDATA]  ;  /*  all  the  vowels  */ 

char  *final con [MAXDATA] ;  /*  final  consonants  */ 

int  countl=0,  count2=0,  count3=0,  counter=0,  i,  j,  k,  1; 

char  unit  [MAXLENGTH]  ;  /*  quote  +  first  consonant  +  vowel  */ 

char  unit2 [MAXLENGTH];  /*  final  consonant  +  quote  */ 

char  units  [MAXLENGTH]  ;  /*  quote  +  first  consonant  +  vowel  */ 

/*  +  final  consonant  +  quote  */ 

char  testing [MAXLENGTH]  =  /*  empty  quotes  */ 

char  testing2 [MAXLENGTH]  =  "X'-ReulN"";  /*  "Reul"  */ 

char  testings [MAXLENGTH]  =  "\"Neun\"":  /*  "Heun"  */ 
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/+  Open  all  the  necessary  files  to  find  the  possible  combinations  */ 
/*  These  files  include  f irst- consonant ,  all-vowels,  and  */ 
/♦  final-consonant  */ 

/*  The  consonants  and  vowels  are  in  romanized  form  */ 


if  ((fpl  =  fopenC'first-consonants’*,  *'r"))  ==  NULL){ 
printf ("Cannot  open  first- consonants  file.\n"); 
exit (1) ; 

} 

else 

•C 

while  (fscanf (fpl,"y,s",&firstcon[countl])  !=  EOF){ 
count 1++; 

f irst con [count 1]=" 

> 

fclose(fpl); 

} 

if  ((fp2  =  fopen( "all-vowels",  "r"))  ==  NULL){ 
printf ("Cannot  open  all-vowels  file.\n"); 
exit(l); 

> 

else 

while  (fscanf (fp2,"y,s",&vowel[count2])  !=  EOF) 
count2++ ; 
f close(fp2) ; 

} 

if  ((fp3  =  fopen( "final-consonants",  "r"))  ==  NULL){ 
printf  ("Cannot  open  final-consonants  fileAn"); 
exit(l); 

> 

else 

while  (fscanf (fp3,"y,s",&finalcon[count3])  !=  EOF) 
count3++; 
f close(fp3) ; 

} 


65 


/+  The  Korean  syllables  get  produced  by  permutating 
/♦  first-consonants,  all-vowels,  and  final- consonants 
/*  If  a  syllable  does  not  have  a  final-consonant,  the  attached 
/*  "Eul”  becomes  "Reul,"  and  *'Eun"  becomes  "Neun’* 


*/ 

♦/ 

*/ 

*/ 

*/ 

♦/ 


if  ((result  =  f open ("chart " ,  "w"))  ==  NULL)^ 
printf ("Cannot  open  chart  fileAn"); 
exit (1) ; 

> 

else 

f  printf  (result ,  "\\begin{sshr}\n\n" ) ; 

/*  this  is  necessary  to  run  sshr2ks  */ 

f  or (k=0 ; k<  counts ; k++ ) 
for(j=0; j<count2; j++) 

for(i=0;i<=countl;i++)  { 

f  printf  (result,  "\"%sy.s  y.sX"  \"y.sy.sys\"\n" , 

&firstcon[i]  ,&vowelCj]  ,&f  inalconCk]  , 

&f  irstconCi]  ,&vowelCj]  ,&f  inalcon[k]  ) ; 
count er++;} 

f or ( j =0 ; j  <count 2 ; j  ++) 
f or (i=0 ; i<=countl ; i++) 

f  printf  (result ,  "\"ysy*sEul\"  \"ysy,sReul\"\n" , 

&f  irstconCi]  ,&vowelCj]  ,&firstconCi3  ,&vowel[j]  ) ; 
for(j=0; j<count2; j++) 
for(i=0; i<=countl;i++) 

f  printf  (result ,  "\"y,sy.sEun\"  \"y,sy,sNeun\"\n" , 

&f  irstconCi]  ,&vowelCj]  ,&f irstconCi]  ,&vowelCj]  )  ; 


fprintf  (result  ,"\\end{sshr}") ; 


system("sshr2ks  chart  >  rough-korean-chart") ; 

/*  converts  romanized-Korean  to  Korean  characters  */ 

/*  in  this  process,  the  impossible  ones  become  blanks  ♦/ 
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systeia(**ks2sshr  rough-korean-chart  >  rough-romanized-chart") ; 

/*  converting  back  to  rough-korean-ckart  to  do  some  cleaning  */ 
/*  up  of  these  impossible  ones  ♦/ 
f close(result) ; 


/*  The  following  eliminates  the  impossible  syllables  */ 
/*  They  are  the  entries  that  contain  either  blanks,  '‘Eul,"  or  */ 
/*  "Reul"  */ 


if  ((messy  =  f open (’’rough-romanized- chart "  ,  "r"))  ==  NULL){ 
printf ("Cannot  open  messy-chart  file.W); 
exit(l) ; 

> 

else 

if  ((clean  =  f op enC'romanized- chart " ,  "w"))  ==  NULD-C 
printf ("Cannot  open  romanized-chart",  "w"); 
exit(l) ; 

> 

else  { 

fprintf (clean, "\\begin{sshr}") ; 
f  scanf  (messy ,  "V.s", unit) ; 

1  =  0; 

while  ((fscanf (messy, "%sy.s%s", unit, unit2, units)  !=  EOF) 

&&  (1  <=  1994) ){ 

1++; 

if  (strncmp(unit3, testing, 2) !=0) 

fprintf  (clean,  "7,3  */,s  V.sXn"  , 

feunit  ,&unit2,&unit3) ;  3* 

while  (fscanf (messy, "7, s'/s", unit, unit2)  !=  EOF) 
if  ( (strncmp(unit2,testing2, 10)  1=0)  &S: 

(strncmp(unit2, testings, 10) !=0)) 
fprintf  (clean,  "'/.s  V.sXn"  ,&unit  ,&unit2)  ; 

fprintf (clean, "\\end{sshr}") ; 
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system('*sshr2ks  romanized-cliaxt  >  kor ean- chart  .text") ; 

/*  korean-chart  contains  all  the  clean  Korean  entries  now  */ 

systein("sshr2ks  specials. eng  >  specials. kor" ) ; 

/*  specials. eng  has  the  special  cases  that  are  not  */ 
/*  produced  by  simple  combinations  */ 

system("cat  specials. kor  kor ean- chart .text  > 
kor ean-rewr it e-rules .text") ; 

/*  these  two  files  are  concatenated  */ 

> 

f closeCclean) ; 
f close(messy) ; 


} 
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TABLE  C  3 
SpeciaI.eng 


•  {GgaJi}" 

'  {ESeo}" 

•  {SiE}" 

‘{Je}  ” 

'  {GeiinCheoE>" 
'  <Eui}*' 

'  {from}-*' 

'  {Eul}" 

'  Eul" 

'  {Euii>" 

'  Eun" 


"GgaJi" 

"ESeo" 

"SiE" 

"Je  " 

"  GeunCheoE" 
"Eui" 

II  II 

"Eul" 

"Eul" 

"Eun" 

"Eun" 
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APPENDIX  D 
INFLECTION  PATTERNS 


TABLE  D-1 

Words  Belonging  to  VI  “Ha Da”  Verbs 


begin 

“Si  Jag” 

destroy 

“PaGoi” 

discover 

“BalGyeon" 

engage 

“GoChag" 

engaged-with 

“GoChag" 

fortify 

"ChugSeong” 

monitor 

"GamCheong” 

observe 

“GoanChal" 

pass 

“TongGoa" 

pay 

"JiBul” 

pay-attention 

“JuEui” 

penetrate 

“ChimTu” 

possess 

“SoYu" 

repeat 

“Ban  Bog” 

report 

“BoGo" 

request 

“YoGu" 

sight 

“MogGyeog” 

signal 

“SinHo” 

think 

“SaingGag" 

try 

“Si  Do” 

use 

“SaYong”  .  ^ 
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TABLE  D-2 

Words  Belonging  to  V2  Verbs 

bury  I  “Mud” 
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VSTEM 
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TABLE  D-3 


Words  Belonging  to  V3  Verbs 


receive 

“Bad” 

take-action 

“Mat" 

take-over 

"InGyeBad” 
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3A1 


TABLE  D-4 

Words  Belonging  to  V4  Verbs 

draw  “Geus” 
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TABLE  D-5 

Words  Belonging  to  V5  Verbs 

roast  “Gub” 
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